pith. sign in

arxiv: 2605.08098 · v1 · submitted 2026-04-16 · 💻 cs.LG

Reinforcement learning for inverse structural design and rapid laser cutting of kirigami prototypes

Pith reviewed 2026-05-12 00:44 UTC · model grok-4.3

classification 💻 cs.LG
keywords kirigamiinverse designreinforcement learningconditional flow matchingmetamaterialslaser cuttingdeployable structures
0
0 comments X

The pith

A reinforcement learning approach generates geometrically valid kirigami cut patterns from target shapes using a single simulation call.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RL-Kirigami to solve inverse design for kirigami metamaterials by first training an optimal-transport conditional flow matching model that proposes ratio fields for cut layouts. Group Relative Policy Optimization then tunes those proposals against rewards that score silhouette overlap with the target, satisfaction of discrete compatibility rules, and smoothness of the ratio field. A marching decoder built into the generator enforces global geometric consistency so that every output layout remains valid for deployment. Across procedurally generated test shapes the pretrained prior alone reaches 94.2 percent silhouette IoU while using only one forward simulation instead of hundreds. GRPO raises accuracy to 94.91 percent and, when regularity is rewarded, lowers total variation of the ratio field from 0.95 to 0.81 with almost no loss in match quality. The same layouts are exported directly to DXF files and laser-cut from thin polymer sheets to produce working prototypes in roughly eight minutes.

Core claim

RL-Kirigami combines a pretrained OT-CFM prior with GRPO to generate compatible ratio fields for compact reconfigurable parallelogram quad kirigami. A marching decoder enforces global geometric compatibility during generation. On procedurally generated targets, a single sample from the prior yields 94.2 percent sIoU and outperforms solver baselines with only one forward evaluation instead of hundreds. GRPO boosts this to 94.91 percent sIoU; adding a regularity reward lowers total variation of the ratio field from 0.95 to 0.81 at nearly the same accuracy. The designs are laser-cut from 50 micrometer polymer sheets to yield working prototypes in roughly eight minutes each.

What carries the argument

The OT-CFM generator of ratio fields, the marching decoder that propagates compatibility constraints across the grid, and the GRPO policy optimizer driven by nondifferentiable rewards for shape fidelity, non-overlap, and regularity.

If this is right

  • Only one call to the forward simulator is needed to produce a high-accuracy design.
  • Silhouette intersection-over-union reaches 94.9 percent on held-out procedural targets.
  • Ratio-field regularity can be improved without sacrificing matching accuracy.
  • Valid layouts are produced in DXF format ready for laser cutting.
  • The overall method yields a complete manufacturing-aware pipeline from target shape to physical part.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pretrain-plus-RL structure might transfer to other inverse problems with discrete geometric constraints such as origami folding or truss design.
  • Interactive design tools could let users sketch a target and receive a cuttable pattern in seconds once the prior is trained.
  • Physical testing on shapes outside the procedural distribution would test whether the learned distribution covers practical engineering needs.
  • Reducing the number of simulator calls opens the door to incorporating more expensive physics-based rewards in future versions.

Load-bearing premise

The procedurally generated target shapes and the combination of marching decoder with the three rewards together capture every relevant geometric and manufacturing constraint that would arise in real deployment.

What would settle it

A laser-cut prototype from a model-generated layout that either overlaps, fails to deploy to the target silhouette, or violates the parallelogram quad compatibility rules when assembled.

Figures

Figures reproduced from arXiv: 2605.08098 by Dena Shahriari, Milad Yazdani, Shahriar Shalileh.

Figure 1
Figure 1. Figure 1: Closed-loop RL-Kirigami fine-tunes the OT-CFM generator with rewards from the forward simulator and evaluator. For a target deployed silhouette 𝐲, the generator proposes a kirigami cut pattern, and the environment simulates ̂𝐲 for comparison with 𝐲. Group Relative Policy Optimization (GRPO) updates the generator to increase the likelihood of samples that satisfy constraints and match the target. p 0 ij p 1… view at source ↗
Figure 2
Figure 2. Figure 2: Kirigami sheet and local notation in the negative space view. Left: sheet with an 𝑚 ×𝑛 array of voids. Right: void (𝑖, 𝑗) with corner vertices 𝐩 0…3 𝑖𝑗 (blue), side lengths 𝑎𝑖𝑗 and 𝑏𝑖𝑗, and deployment angle 𝜙𝑖𝑗 at vertex 0, used by the marching decoder (Eq. (2)). M. Yazdani et al.: Preprint submitted to Elsevier Page 12 of 18 [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: During training, the model learns the velocity along the straight path from a base sample 𝐱0 to a feasible design 𝐱1 . At inference, the learned field transports a new base sample toward 𝑃 (𝐱 ∣ 𝐲, 𝜑, ). M. Yazdani et al.: Preprint submitted to Elsevier Page 13 of 18 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative target silhouettes together with the compact rectangle and converted target layouts used to condition the OT-CFM prior. M. Yazdani et al.: Preprint submitted to Elsevier Page 14 of 18 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Wall clock time vs. grid size for the solver experiment. Solver baselines use the same tolerance-based solver setup as [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Best-of-𝐾 sensitivity on the full test split. Left: mean sIoU vs. 𝐾. Right: total wall clock time per target on a log scale. For the OT-CFM prior, 𝐾 is the number of generated candidates per target. For solver baselines, 𝐾 is the number of independent runs per target. M. Yazdani et al.: Preprint submitted to Elsevier Page 15 of 18 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Inference-step sensitivity for Diffusion and the OT-CFM prior. Left: sIoU vs. steps. Right: success rate vs. steps. Star markers show the reported settings. Square markers show the 1-step cGAN baseline. Target seed 0 sIoU 88.3% cVAE mean sIoU 89.2% seed 1 sIoU 89.2% seed 2 sIoU 90.0% sIoU 90.4% cGAN mean sIoU 90.4% sIoU 90.4% sIoU 90.4% sIoU 91.7% Diffusion mean sIoU 92.4% sIoU 93.4% sIoU 92.1% sIoU 96.5% … view at source ↗
Figure 8
Figure 8. Figure 8: Three-sample visual comparison for the conditional generators in [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: One selected target used to illustrate the effect of regularity preference in RL. Left: target silhouette. Middle: accuracy-only RL output. Right: accuracy plus regularity RL output. Panel titles report the corresponding sIoU and TV(𝐱). Lower TV(𝐱) indicates higher ratio-field regularity. M. Yazdani et al.: Preprint submitted to Elsevier Page 16 of 18 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: RL training-budget sensitivity. Left: sIoU vs. training environment calls. Right: TV(𝐱) vs. training environment calls. Star markers show the values reported in [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: DXF export used for prototype fabrication. (a) A decoded layout from one generated ratio field 𝐱. (b) The corresponding cutter-ready DXF file, with the final cut path, connector markers, and the local connector cut highlighted in red. (c) Zoomed connector detail. Gray shows the raw local cuts, black shows the final DXF path, and red shows the connector cut kept around the marker. M. Yazdani et al.: Prepri… view at source ↗
Figure 12
Figure 12. Figure 12: Rapid in situ prototyping on polyamide (PA) by laser cutting. (A) Heart, (B) Hexagonal, and (C) Star prototypes shown in a compact state (left) and the deployed prototype state (right). (B) Hexagonal prototype, shown in a compact state (left) and the deployed prototype state (right). (C) Star prototype, shown in a compact state (left) and the deployed prototype state (right). Scale bars: 1 cm. M. Yazdani … view at source ↗
read the original abstract

Kirigami is an increasingly useful fabrication method to produce shape-programmable metamaterial structures. However, inverse design remains difficult because deployment is nonlinear, and feasible cut layouts must satisfy discrete compatibility rules, avoid overlap, and map one target shape to valid designs. We present RL-Kirigami, an inverse design framework that combines optimal-transport conditional flow matching (OT-CFM) with reinforcement learning to generate compatible ratio fields for compact reconfigurable parallelogram quad kirigami. A marching decoder enforces global geometric compatibility, and Group Relative Policy Optimization (GRPO) aligns the generator with nondifferentiable rewards for silhouette matching, feasibility, and ratio-field regularity. Across procedurally generated target shape instances, a single sample from the pretrained OT-CFM prior reached $94.2%$ sIoU and outperformed solver baselines while reducing forward simulator evaluations from hundreds to 1. GRPO improved accuracy to $94.91%$ sIoU and, with regularity included, reduced $\mathrm{TV}(\mathbf{x})$ from 0.95 to 0.81 while maintaining $94.83%$ sIoU. Generated layouts were exported to DXF and laser-cut in $50~\mu\mathrm{m}$ polymeric sheets to produce deployable prototypes in $8.0 \pm 1.0$ minutes per part. These results support a manufacturing-aware inverse design workflow for deployable kirigami metamaterials under hard geometric feasibility constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents RL-Kirigami, a framework that combines optimal-transport conditional flow matching (OT-CFM) with Group Relative Policy Optimization (GRPO) for inverse design of compact reconfigurable parallelogram quad kirigami. A marching decoder enforces global geometric compatibility on ratio fields, while nondifferentiable rewards target silhouette matching, feasibility, and regularity. On procedurally generated target shapes, a single OT-CFM sample achieves 94.2% sIoU and outperforms solver baselines with one forward evaluation; GRPO raises this to 94.91% sIoU, and adding regularity reduces TV(x) from 0.95 to 0.81 while retaining 94.83% sIoU. Generated layouts are exported to DXF and laser-cut into 50 μm polymeric sheets, yielding deployable prototypes in 8.0 ± 1.0 minutes per part.

Significance. If the central performance claims hold under more rigorous validation, the work offers a computationally efficient, manufacturing-aware pipeline for constrained inverse design of deployable metamaterials, reducing simulator calls from hundreds to one while incorporating physical fabrication. The integration of a pretrained generative prior with RL under hard geometric constraints and the successful rapid prototyping are notable strengths that could influence workflows in kirigami and related fabrication domains.

major comments (3)
  1. [Abstract / Results] Abstract and results: The headline metrics (94.2% sIoU from one OT-CFM sample, 94.91% after GRPO, TV(x) drop to 0.81) are reported as single point values without error bars, standard deviations, number of test instances, or statistical tests comparing to baselines. This leaves the outperformance and regularity claims only partially supported, as the abstract notes concrete numbers but supplies no variance or protocol details.
  2. [Methods / Reward formulation] Reward design and methods: The manuscript states that GRPO aligns the generator with nondifferentiable rewards including a regularity term, yet provides no details on how the post-hoc regularity weights were selected, no ablation studies on their impact, and no sensitivity analysis. This choice directly affects the reported TV(x) reduction and must be justified for the central claim of improved regularity without accuracy loss.
  3. [Fabrication / Experimental validation] Fabrication and evaluation: While the abstract reports successful laser-cut prototypes in 8 minutes, no quantitative deployment success rates, failure-mode analysis (e.g., buckling, edge quality, tolerance effects), or results on non-procedurally generated targets are supplied. The marching decoder and rewards are asserted to capture all constraints, but the evaluation remains confined to synthetic procedural shapes, weakening the manufacturing-aware claim.
minor comments (2)
  1. [Abstract] Notation: sIoU and TV(x) appear without explicit definitions or references in the abstract; these should be defined at first use or in a preliminary section for clarity.
  2. [Abstract] The ±1.0 minute fabrication time is given without specifying how many parts were timed or the measurement protocol.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and have revised the manuscript accordingly where additional analysis or clarification was feasible. We note limitations in the current scope of experiments.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and results: The headline metrics (94.2% sIoU from one OT-CFM sample, 94.91% after GRPO, TV(x) drop to 0.81) are reported as single point values without error bars, standard deviations, number of test instances, or statistical tests comparing to baselines. This leaves the outperformance and regularity claims only partially supported, as the abstract notes concrete numbers but supplies no variance or protocol details.

    Authors: We agree that variance information and statistical details are important for supporting the performance claims. In the revised manuscript we will specify the number of procedurally generated test instances, report standard deviations for all headline metrics (sIoU and TV(x)), include error bars on the relevant result figures, and add statistical significance tests (paired Wilcoxon signed-rank tests) against the solver baselines. These updates will appear in both the abstract and the results section. revision: yes

  2. Referee: [Methods / Reward formulation] Reward design and methods: The manuscript states that GRPO aligns the generator with nondifferentiable rewards including a regularity term, yet provides no details on how the post-hoc regularity weights were selected, no ablation studies on their impact, and no sensitivity analysis. This choice directly affects the reported TV(x) reduction and must be justified for the central claim of improved regularity without accuracy loss.

    Authors: The regularity weight was selected via a small grid search on a held-out validation subset of procedural shapes to maintain high sIoU while lowering total variation. We acknowledge that the original manuscript omitted the selection procedure and supporting ablations. In the revision we will add an explicit description of the weight-selection process, an ablation table showing sIoU and TV(x) for several weight values, and a brief sensitivity plot; these will be placed in the methods section and supplementary material. revision: yes

  3. Referee: [Fabrication / Experimental validation] Fabrication and evaluation: While the abstract reports successful laser-cut prototypes in 8 minutes, no quantitative deployment success rates, failure-mode analysis (e.g., buckling, edge quality, tolerance effects), or results on non-procedurally generated targets are supplied. The marching decoder and rewards are asserted to capture all constraints, but the evaluation remains confined to synthetic procedural shapes, weakening the manufacturing-aware claim.

    Authors: The reported fabrication time already includes a standard deviation obtained from repeated cuts. The current quantitative evaluation deliberately uses procedurally generated targets to enable controlled, reproducible comparison with solver baselines. We have added a limitations paragraph that discusses observed fabrication issues (minor edge fraying and tolerance sensitivity) and the scope of the present validation. Comprehensive success-rate statistics and experiments on arbitrary real-world targets would require a substantially larger physical test campaign that lies outside the present study. revision: partial

standing simulated objections not resolved
  • Quantitative deployment success rates and systematic failure-mode analysis across a large set of physical prototypes
  • Performance results on non-procedurally generated target shapes

Circularity Check

0 steps flagged

No significant circularity in method or results

full rationale

The paper describes an empirical pipeline that pretrains an OT-CFM prior, applies a marching decoder for geometric compatibility, and uses GRPO to optimize against external nondifferentiable rewards for silhouette, feasibility, and regularity. Performance numbers (94.2% sIoU from the prior, 94.91% after GRPO) are measured on procedurally generated target shapes and are not shown to reduce by the paper's own equations to quantities defined solely in terms of fitted parameters or self-citations. The central claims remain falsifiable against the held-out procedural test set and physical prototypes without tautological equivalence to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Framework rests on standard machine-learning components; no explicit free parameters, axioms, or new physical entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5567 in / 1195 out tokens · 49067 ms · 2026-05-12T00:44:56.688837+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Nature Materials 20, 1085–1092

    Kirigami-inspired stents for sustained local delivery of therapeutics. Nature Materials 20, 1085–1092. Bastek,J.H.,Kochmann,D.M.,2023. Inversedesignofnonlinearmechanicalmetamaterialsviavideodenoisingdiffusionmodels. NatureMachine Intelligence 5, 1466–1475. Bertoldi, K., Vitelli, V., Christensen, J., Van Hecke, M.,

  2. [2]

    Nature Reviews Materials 2, 1–11

    Flexible mechanical metamaterials. Nature Reviews Materials 2, 1–11. Blees,M.K.,Barnard,A.W.,Rose,P.A.,Roberts,S.P.,McGill,K.L.,Huang,P.Y.,Ruyack,A.R.,Kevek,J.W.,Kobrin,B.,Muller,D.A.,etal.,2015. Graphene kirigami. Nature 524, 204–207. Bliah, O., Hegde, C., Tan, J.M.R., Magdassi, S.,

  3. [3]

    Frontiers in Robotics and AI 9, 872007

    Curvilinear kirigami skins let soft bending actuators slither faster. Frontiers in Robotics and AI 9, 872007. Brown,N.K.,Deshpande,A.,Garland,A.,Pradeep,S.A.,Fadel,G.,Pilla,S.,Li,G.,2023. Deepreinforcementlearningforthedesignofmechanical metamaterials with tunable deformation and hysteretic characteristics. Materials & Design 235, 112428. M. Yazdani et al...

  4. [4]

    Physical Review Research 3, 043030

    Compact reconfigurable kirigami. Physical Review Research 3, 043030. Dudte,L.H.,Choi,G.P.,Becker,K.P.,Mahadevan,L.,2023. Anadditiveframeworkforkirigamidesign. NatureComputationalScience3,443–454. Felsch, G., Slesarenko, V.,

  5. [5]

    Plastic and reconstructive surgery 115, 1077–1086

    Anterolateral thigh flap reconstruction of large external facial skin defects: a follow-up study on functional and aesthetic recipient-and donor-site outcome. Plastic and reconstructive surgery 115, 1077–1086. Park,T.,Liu,M.Y.,Wang,T.C.,Zhu,J.Y.,2019. Semanticimagesynthesiswithspatially-adaptivenormalization,in:ProceedingsoftheIEEE/CVF conference on compu...

  6. [6]

    Science Robotics 3, eaar7555

    Kirigami skins make a simple soft actuator crawl. Science Robotics 3, eaar7555. Rosafalco,L.,DePonti,J.M.,Iorio,L.,Craster,R.V.,Ardito,R.,Corigliano,A.,2023. Reinforcementlearningoptimisationforgradedmetamaterial design using a physical-based constraint on the state representation and action space. Scientific Reports 13, 21836. Salimans, T., Goodfellow, I...

  7. [7]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300 . Sobol, I.M.,

  8. [8]

    International Conference on Learning Representations

    Denoising diffusion implicit models. International Conference on Learning Representations . Sun,R.,Zhang,B.,Yang,L.,Zhang,W.,Farrow,I.,Scarpa,F.,Rossiter,J.,2018. Kirigamistretchablestrainsensorswithenhancedpiezoelectricity induced by topological electrodes. Applied Physics Letters

  9. [9]

    Multimodallimblesscrawlingsoftrobotwithakirigami skin

    Tirado,J.,Parvaresh,A.,Seyidoğlu,B.,Bedford,D.A.,Jørgensen,J.,Rafsanjani,A.,2025. Multimodallimblesscrawlingsoftrobotwithakirigami skin. Cyborg and Bionic Systems 6,

  10. [10]

    Kirigami-inspiredthick-paneldeployablestructures

    Wang,C.,Zhang, D.,Li,J.,You,Z.,2022. Kirigami-inspiredthick-paneldeployablestructures. InternationalJournalofSolidsandStructures 251, 111752. Wei, R., Li, H., Chen, Z., Hua, Q., Shen, G., Jiang, K.,

  11. [11]

    Physical Review Materials 2, 110601

    Multistable kirigami for tunable architected materials. Physical Review Materials 2, 110601. Yang,Y.,Wang,L.,Zhai,X.,Chen,K.,Wu,W.,Zhao,Y.,Chen,F.,Liu,L.,Fu,X.M.,2026. Guideddiffusionforfastinversedesignofvoxel-based mechanical metamaterials. Smart Materials in Manufacturing 4, 100129. M. Yazdani et al.:Preprint submitted to ElsevierPage 10 of 18 Yazdani,...

  12. [12]

    Inversedesignofprogrammableshape-morphingkirigamistructures

    Ying,X.,Fernando,D.,Dias,M.A.,2025. Inversedesignofprogrammableshape-morphingkirigamistructures. InternationalJournalofMechanical Sciences 286, 109840. Yu, H., Jafari, M., Mujahid, A., Garcia, C.F., Shah, J., Sinha, R., Huang, Y., Shakiba, D., Hong, Y., Cheraghali, D., et al.,

  13. [13]

    3836–3847

    Zhang,L.,Rao,A.,Agrawala,M.,2023.Addingconditionalcontroltotext-to-imagediffusionmodels,in:ProceedingsoftheIEEE/CVFinternational conference on computer vision, pp. 3836–3847. Zheng,X.,Zhang,X.,Chen,T.T.,Watanabe,I.,2023. Deeplearninginmechanicalmetamaterials:frompredictionandgenerationtoinversedesign. Advanced Materials 35, 2302530. Zheng, Y., Niloy, I., ...

  14. [14]

    Physical review letters 128, 208003

    Continuum field theory for the deformations of planar kirigami. Physical review letters 128, 208003. M. Yazdani et al.:Preprint submitted to ElsevierPage 11 of 18 Target deployed silhouette Sheet Evaluator Learning from Experience (GRPO) Conditions ( ) Policy Simulator Simulated deployed silhouette Decoded patterns Reward RL-Kirigami: Figure 1:Closed-loop...

  15. [15]

    (a) A decoded layout from one generated ratio field𝐱

    (a) Decoded layout (b) DXF export (c) Local cut detail raw cuts final DXF path connector marker connector cut zoom region Figure 11:DXF export used for prototype fabrication. (a) A decoded layout from one generated ratio field𝐱. (b) The corresponding cutter-ready DXF file, with the final cut path, connector markers, and the local connector cut highlighted...

  16. [16]

    The OT-CFM row instantiates the shared mask-conditioned U-Net backbone at each square grid size and measures one Euler-8 OT-CFM sample plus evaluation per target without retraining

    The solver rows use the same tolerance-based solver settings as Table 1, averaged over the three masks. The OT-CFM row instantiates the shared mask-conditioned U-Net backbone at each square grid size and measures one Euler-8 OT-CFM sample plus evaluation per target without retraining. Sec. 3.3 uses the full test split. Sec. 3.4 reports one final evaluatio...

  17. [17]

    OT-CFM uses this backbone with OT coupling, meaning optimal transport pairing between base samples and training designs during flow matching training. The final OT-CFM run uses Euler sampling with 9 time points and step size1∕8, learning rate2 × 10−5, weight decay0.05, stochastic weight averaging, batch size 64, and 400 training epochs. The diffusion base...