Recognition: 2 theorem links
· Lean TheoremTORA: Topological Representation Alignment for 3D Shape Assembly
Pith reviewed 2026-05-13 17:06 UTC · model grok-4.3
The pith
TORA aligns flow models to frozen 3D encoders to speed assembly training and improve accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TORA introduces a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. It realizes this first through token-wise cosine matching of learned geometric descriptors and then through a Centered Kernel Alignment loss to match similarity structures, with alignment most effective at later transformer layers where spatial relations emerge. Geometry- and contact-centric teacher properties drive the gains rather than semantic classification ability.
What carries the argument
Topological representation alignment that matches token-wise cosine similarities or centered kernel alignment between the student flow model representations and the frozen teacher encoder representations.
If this is right
- Training converges up to 6.9 times faster than unaligned flow-matching baselines.
- Assembly accuracy rises on in-distribution benchmarks spanning geometric, semantic, and inter-object tasks.
- Performance holds up better under domain shifts to unseen real-world and synthetic datasets.
- Zero-shot transfer gains are especially large compared with prior methods.
- State-of-the-art results are reached with zero added cost at inference time.
Where Pith is reading between the lines
- Teacher encoders selected for geometric rather than semantic properties are likely to produce stronger alignment benefits in assembly tasks.
- The same alignment strategy could be applied to other generative backbones such as diffusion models for 3D tasks.
- Systematic sweeps of alignment depth across different network architectures would identify general rules for when relational distillation helps most.
- Extending the approach to time-varying or articulated assemblies would test whether the learned topological relations remain useful beyond static part placement.
Load-bearing premise
That geometry- and contact-centric properties extracted from the frozen teacher encoder supply the right relational guidance for the flow model and that alignment at later layers improves outcomes without negative transfer.
What would settle it
A controlled training run on one of the five assembly benchmarks in which adding the TORA alignment loss produces slower convergence or lower final accuracy than the baseline flow model without alignment.
Figures
read the original abstract
Flow-matching methods for 3D shape assembly learn point-wise velocity fields that transport parts toward assembled configurations, yet they receive no explicit guidance about which cross-part interactions should drive the motion. We introduce TORA, a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. We first realize this via simple instantiation, token-wise cosine matching, which injects the learned geometric descriptors from the teacher representation. We then extend to employ a Centered Kernel Alignment (CKA) loss to match the similarity structure between student and teacher representations for enhanced topological alignment. Through systematic probing of diverse 3D encoders, we show that geometry- and contact-centric teacher properties, not semantic classification ability, govern alignment effectiveness, and that alignment is most beneficial at later transformer layers where spatial structure naturally emerges. TORA introduces zero inference overhead while yielding two consistent benefits: faster convergence (up to 6.9$\times$) and improved accuracy in-distribution, along with greater robustness under domain shift. Experiments on five benchmarks spanning geometric, semantic, and inter-object assembly demonstrate state-of-the-art performance, with particularly pronounced gains in zero-shot transfer to unseen real-world and synthetic datasets. Project page: https://nahyuklee.github.io/tora.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TORA, a topology-first representation alignment framework for flow-matching models in 3D shape assembly. It distills relational geometric and contact-centric structure from a frozen pretrained 3D encoder into the flow backbone via token-wise cosine matching and Centered Kernel Alignment (CKA) loss, with alignment applied most effectively at later transformer layers. The method is claimed to deliver up to 6.9× faster convergence, higher in-distribution accuracy, improved robustness under domain shift, and state-of-the-art results on five benchmarks spanning geometric, semantic, and inter-object assembly tasks, all with zero inference overhead.
Significance. If the reported gains hold under rigorous verification, TORA would offer a practical, low-cost way to inject useful relational priors from pretrained geometry encoders into flow-matching pipelines, with particular value for zero-shot transfer in assembly tasks. The systematic encoder-probing experiments provide useful empirical guidance on which pretrained properties (geometry/contact vs. semantics) transfer effectively.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. We are encouraged that the topology-first alignment approach, its empirical guidance on encoder properties, and the reported gains in convergence and robustness are viewed as potentially valuable for flow-matching pipelines.
Circularity Check
No significant circularity
full rationale
The paper's central claims rest on an empirical pipeline: a frozen external pretrained 3D encoder supplies relational targets (via token-wise cosine or CKA losses) that are injected into a flow-matching student during training; benefits are then measured on held-out benchmarks. No derivation, equation, or performance metric is defined in terms of itself or reduced to a fitted parameter that is later renamed a prediction. Alignment targets originate outside the model being trained, and reported gains (convergence speed, accuracy, zero-shot robustness) are independent observables rather than tautological consequences of the loss. Any self-citations are peripheral and non-load-bearing for the core argument.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Frozen pretrained 3D encoders capture geometry- and contact-centric relational structure that is useful for guiding flow-matching assembly
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
alignment is most beneficial at later transformer layers where spatial structure naturally emerges
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
An, H., Kim, J.H., Park, S., Jung, J., Han, J., Hong, S., Kim, S.: Cross-view completion models are zero-shot correspondence estimators. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1103–1115 (2025) 3
work page 2025
-
[2]
In: Sensor fusion IV: control paradigms and data structures
Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures. vol. 1611, pp. 586–606. Spie (1992) 22, 23
work page 1992
-
[3]
In: ACM SIGGRAPH 2011 papers, pp
Chaudhuri, S., Kalogerakis, E., Guibas, L., Koltun, V.: Probabilistic reasoning for assembly-based 3d modeling. In: ACM SIGGRAPH 2011 papers, pp. 1–10 (2011) 1
work page 2011
-
[4]
Advances in Neural Information Processing Systems34, 9011–9023 (2021) 3
Cho, S., Hong, S., Jeon, S., Lee, Y., Sohn, K., Kim, S.: Cats: Cost aggregation transformers for visual correspondence. Advances in Neural Information Processing Systems34, 9011–9023 (2021) 3
work page 2021
-
[5]
Cho, S., Hong, S., Kim, S.: Cats++: Boosting cost aggregation with convolutions andtransformers.IEEETransactionsonPatternAnalysisandMachineIntelligence 45(6), 7174–7194 (2022) 3
work page 2022
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cho,S.,Shin,H.,Hong,S.,Arnab,A.,Seo,P.H.,Kim,S.:Cat-seg:Costaggregation for open-vocabulary semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4113–4123 (2024) 6
work page 2024
-
[7]
The Journal of Machine Learning Research13, 795–828 (2012) 3, 7
Cortes, C., Mohri, M., Rostamizadeh, A.: Algorithms for learning kernels based on centered alignment. The Journal of Machine Learning Research13, 795–828 (2012) 3, 7
work page 2012
-
[8]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 18
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[9]
com / PyTorchLightning/pytorch-lightning11
Falcon, W., team, T.P.L.: Pytorch lightning (2019),https : / / github . com / PyTorchLightning/pytorch-lightning11
work page 2019
-
[10]
arXiv preprint arXiv:2601.02457 (2026) 8, 18
Hadgi, S., Gong, B., Sundararaman, R., Pierson, E., Li, L., Wonka, P., Ovsjanikov, M.: Patchalign3d: Local feature alignment for dense 3d shape understanding. arXiv preprint arXiv:2601.02457 (2026) 8, 18
-
[11]
D 2USt3R: Enhancing 3D reconstruction for dynamic scenes.arXiv preprint arXiv:2504.06264, 2025
Han, J., An, H., Jung, J., Narihira, T., Seo, J., Fukuda, K., Kim, C., Hong, S., Mitsufuji, Y., Kim, S.: Dˆ 2ust3r: Enhancing 3d reconstruction with 4d pointmaps for dynamic scenes. arXiv preprint arXiv:2504.06264 (2025) 3
-
[12]
Han, J., Hong, S., Jung, J., Jang, W., An, H., Wang, Q., Kim, S., Feng, C.: Emergent outlier view rejection in visual geometry grounded transformers. arXiv preprint arXiv:2512.04012 (2025) 3
-
[13]
Ad- vanced Robotics30(17-18), 1186–1198 (2016) 1
Harada, K., Nagata, K., Rojas, J., Ramirez-Alpizar, I.G., Wan, W., Onda, H., Tsuji, T.: Proposal of a shape adaptive gripper for robotic assembly tasks. Ad- vanced Robotics30(17-18), 1186–1198 (2016) 1
work page 2016
-
[14]
In: Proceedings of the European conference on computer vision (ECCV)
Hodan, T., Michel, F., Brachmann, E., Kehl, W., GlentBuch, A., Kraft, D., Drost, B., Vidal, J., Ihrke, S., Zabulis, X., et al.: Bop: Benchmark for 6d object pose esti- mation. In: Proceedings of the European conference on computer vision (ECCV). pp. 19–34 (2018) 23
work page 2018
-
[15]
arXiv preprint arXiv:2403.11120 (2024) 3
Hong, S., Cho, S., Kim, S., Lin, S.: Unifying feature and cost aggrega- tion with transformers for semantic and visual correspondence. arXiv preprint arXiv:2403.11120 (2024) 3
-
[16]
Hong, S., Cho, S., Nam, J., Lin, S., Kim, S.: Cost aggregation with 4d convolutional swintransformerforfew-shotsegmentation.In:EuropeanConferenceonComputer Vision. pp. 108–126. Springer (2022) 3 TORA: Topological Representation Alignment 33
work page 2022
-
[17]
arXiv preprint arXiv:2410.22128 (2024) 3
Hong, S., Jung, J., Shin, H., Han, J., Yang, J., Luo, C., Kim, S.: Pf3plat: Pose-free feed-forward 3d gaussian splatting. arXiv preprint arXiv:2410.22128 (2024) 3
-
[18]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Hong, S., Jung, J., Shin, H., Yang, J., Kim, S., Luo, C.: Unifying correspondence pose and nerf for generalized pose-free novel view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20196– 20206 (2024) 3
work page 2024
-
[19]
Hong, S., Kim, S.: Deep matching prior: Test-time optimization for dense corre- spondence.In:ProceedingsoftheIEEE/CVFinternationalconferenceoncomputer vision. pp. 9907–9917 (2021) 3
work page 2021
-
[20]
Ad- vances in Neural Information Processing Systems35, 13512–13526 (2022) 3
Hong, S., Nam, J., Cho, S., Hong, S., Jeon, S., Min, D., Kim, S.: Neural matching fields: Implicit representation of matching fields for visual correspondence. Ad- vances in Neural Information Processing Systems35, 13512–13526 (2022) 3
work page 2022
-
[21]
Huang, J., Zhan, G., Fan, Q., Mo, K., Shao, L., Chen, B., Guibas, L., Dong, H.: Generative 3d part assembly via dynamic graph learning (2020) 3
work page 2020
-
[22]
In: Proceedings of IEEE computer society conference on Com- puter Vision and Pattern Recognition
Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: Proceedings of IEEE computer society conference on Com- puter Vision and Pattern Recognition. pp. 762–768. IEEE (1997) 16
work page 1997
-
[23]
Jocher, G., Stoken, A., Borovec, J., Changyu, L., Hogan, A., Chaurasia, A., Dia- conu, L., Ingham, F., Colmagro, A., Ye, H., et al.: ultralytics/yolov5: v4. 0-nn. silu () activations, weights & biases logging, pytorch hub integration. Zenodo (2021) 6
work page 2021
-
[24]
ACM Transactions on Graphics (TOG)39(6), 1–20 (2020) 1
Jones, R.K., Barton, T., Xu, X., Wang, K., Jiang, E., Guerrero, P., Mitra, N.J., Ritchie, D.: Shapeassembly: Learning to generate programs for 3d shape structure synthesis. ACM Transactions on Graphics (TOG)39(6), 1–20 (2020) 1
work page 2020
-
[25]
arXiv preprint arXiv:2509.18096 , year=
Kim, C., Shin, H., Hong, E., Yoon, H., Arnab, A., Seo, P.H., Hong, S., Kim, S.: Seg4diff: Unveiling open-vocabulary segmentation in text-to-image diffusion transformers. arXiv preprint arXiv:2509.18096 (2025) 4
-
[26]
Lamb, N., Palmer, C., Molloy, B., Banerjee, S., Banerjee, N.K.: Fantastic breaks: A dataset of paired 3d scans of real-world broken objects and their complete coun- terparts. In: CVPR (2023) 3, 10, 13
work page 2023
-
[27]
arXiv preprint arXiv:2510.14945 (2025) 4
Lee, J., Jung, J., Han, J., Narihira, T., Fukuda, K., Seo, J., Hong, S., Mitsufuji, Y., Kim, S.: 3d scene prompting for scene-consistent camera-controllable video generation. arXiv preprint arXiv:2510.14945 (2025) 4
-
[28]
In: Proceedings of the International Conference on Machine Learning (ICML) (2024) 3, 11
Lee, N., Min, J., Lee, J., Kim, S., Lee, K., Park, J., Cho, M.: 3d geometric shape assembly via efficient point cloud matching. In: Proceedings of the International Conference on Machine Learning (ICML) (2024) 3, 11
work page 2024
-
[29]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Lee, N., Min, J., Lee, J., Park, C., Cho, M.: Combinative matching for geometric shape assembly. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9540–9549 (2025) 3, 11
work page 2025
-
[30]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Leng, X., Singh, J., Hou, Y., Xing, Z., Xie, S., Zheng, L.: Repa-e: Unlocking vae for end-to-end tuning of latent diffusion transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 18262–18272 (2025) 3, 4
work page 2025
-
[31]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Li, S., Jiang, Z., Chen, G., Xu, C., Tan, S., Wang, X., Fang, I., Zyskowski, K., McPherron, S.P., Iovita, R., et al.: Garf: Learning generalizable 3d reassembly for real-world fractures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5711–5721 (2025) 2, 3, 10, 11, 13
work page 2025
-
[32]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Li, Y., Mo, K., Duan, Y., Wang, H., Zhang, J., Shao, L.: Category-level multi-part multi-joint 3d shape assembly. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3281–3291 (2024) 3 34 N. Lee et al
work page 2024
-
[33]
Advances in neural information processing systems36, 44860–44879 (2023) 8, 18, 21
Liu, M., Shi, R., Kuang, K., Zhu, Y., Li, X., Han, S., Cai, H., Porikli, F., Su, H.: Openshape:Scalingup3dshaperepresentationtowardsopen-worldunderstanding. Advances in neural information processing systems36, 44860–44879 (2023) 8, 18, 21
work page 2023
-
[34]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) 11, 15
work page 2019
-
[35]
Lu, J., Liang, Y., Han, H., Hua, J., Jiang, J., Li, X., Huang, Q.: A survey on computational solutions for reconstructing complete objects by reassembling their fractured parts. In: Computer Graphics Forum. vol. 44, p. e70081. Wiley Online Library (2025) 3
work page 2025
-
[36]
Lu, J., Sun, Y., Huang, Q.: Jigsaw: Learning to assemble multiple fractured objects (2023),https://openreview.net/forum?id=OwpaO4w6K73, 11
work page 2023
-
[37]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Ma, Z., Yue, Y., Gkioxari, G.: Find any part in 3d. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7818–7827 (2025) 8, 18, 21
work page 2025
-
[38]
McBride, J.C., Kimia, B.B.: Archaeological fragment reconstruction using curve- matching. In: CVPRW (2003) 1
work page 2003
-
[39]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023) 18
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Peebles,W.,Xie,S.:Scalablediffusionmodelswithtransformers.In:Proceedingsof the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023) 4, 5
work page 2023
-
[41]
Advances in neural information processing systems30(2017) 19, 21
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. Advances in neural information processing systems30(2017) 19, 21
work page 2017
-
[42]
In: Proceed- ings of the Computer Vision and Pattern Recognition Conference
Qi, Y., Ju, Y., Wei, T., Chu, C., Wong, L.L., Xu, H.: Two by two: Learning multi- task pairwise objects assembly for generalizable robot manipulation. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 17383–17393 (2025) 3, 10, 11, 24
work page 2025
-
[43]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 18
work page 2021
-
[44]
Journal of computational and applied mathematics20, 53–65 (1987) 16
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics20, 53–65 (1987) 16
work page 1987
-
[45]
Advances in Neural Information Processing Systems35, 38885–38898 (2022) 3, 9, 10, 11, 13, 15, 24
Sellán, S., Chen, Y.C., Wu, Z., Garg, A., Jacobson, A.: Breaking bad: A dataset for geometric fracture and reassembly. Advances in Neural Information Processing Systems35, 38885–38898 (2022) 3, 9, 10, 11, 13, 15, 24
work page 2022
- [46]
-
[47]
Son, K., Almeida, E.B., Cooper, D.B.: Axially symmetric 3d pots configuration system using axis of symmetry and break curve. In: CVPR (2013) 1
work page 2013
-
[48]
arXiv preprint arXiv:2506.05282 (2025) 2, 3, 4, 5, 9, 10, 11, 13, 14, 15, 22, 23, 24
Sun, T., Zhu, L., Huang, S., Song, S., Armeni, I.: Rectified point flow: Generic point cloud pose estimation. arXiv preprint arXiv:2506.05282 (2025) 2, 3, 4, 5, 9, 10, 11, 13, 14, 15, 22, 23, 24
-
[49]
In: ICLR (2025) 3 TORA: Topological Representation Alignment 35
Wang,Z.,Chen,J.,Furukawa,Y.:Puzzlefusion++:Auto-agglomerative3dfracture assembly by denoise and verify. In: ICLR (2025) 3 TORA: Topological Representation Alignment 35
work page 2025
-
[50]
arXiv preprint arXiv:2505.16792 (2025) 4
Wang, Z., Zhao, W., Zhou, Y., Li, Z., Liang, Z., Shi, M., Zhao, X., Zhou, P., Zhang, K., Wang, Z., et al.: Repa works until it doesn’t: Early-stopped, holistic alignment supercharges diffusion training. arXiv preprint arXiv:2505.16792 (2025) 4
- [51]
-
[52]
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Wu, H., Wu, D., He, T., Guo, J., Ye, Y., Duan, Y., Bian, J.: Geometry forcing: Mar- rying video diffusion and 3d representation for consistent world modeling. arXiv preprint arXiv:2507.07982 (2025) 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[53]
Wu, R., Tie, C., Du, Y., Zhao, Y., Dong, H.: Leveraging se (3) equivariance for learning 3d geometric shape assembly. In: ICCV (2023) 3
work page 2023
-
[54]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wu, X., DeTone, D., Frost, D., Shen, T., Xie, C., Yang, N., Engel, J., Newcombe, R., Zhao, H., Straub, J.: Sonata: Self-supervised learning of reliable point rep- resentations. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22193–22204 (2025) 19
work page 2025
-
[55]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wu, X., Jiang, L., Wang, P.S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point transformer v3: Simpler faster stronger. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4840–4851 (2024) 18
work page 2024
-
[56]
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neu- ral network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017) 23
work page Pith review arXiv 2017
-
[57]
In: 2025 International Conference on 3D Vision (3DV)
Xu, B., Zheng, S., Jin, Q.: Spaformer: Sequential 3d part assembly with transform- ers. In: 2025 International Conference on 3D Vision (3DV). pp. 1317–1327. IEEE (2025) 3, 10, 11, 24
work page 2025
-
[58]
arXiv preprint arXiv:2502.13986 (2025) 1
Yoo, S.J., Liu, S., Arshad, M.Z., Kim, J., Kim, Y.M., Aloimonos, Y., Fermuller, C., Joo, K., Kim, J., Hong, J.H.: Structure-from-sherds++: Robust incremental 3d reassembly of axially symmetric pots from unordered and mixed fragment col- lections. arXiv preprint arXiv:2502.13986 (2025) 1
-
[59]
arXiv preprint arXiv:2509.07979 (2025) 4
Yoon, H., Jung, J., Kim, J., Choi, H., Shin, H., Lim, S., An, H., Kim, C., Han, J., Kim, D., et al.: Visual representation alignment for multimodal large language models. arXiv preprint arXiv:2509.07979 (2025) 4
-
[60]
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Yu, S., Kwak, S., Jang, H., Jeong, J., Huang, J., Shin, J., Xie, S.: Representation alignment for generation: Training diffusion transformers is easier than you think. arXiv preprint arXiv:2410.06940 (2024) 2, 4, 21
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19313– 19322 (2022) 18
work page 2022
-
[62]
arXiv preprint arXiv:2512.13689 (2025) 21
Yue, Y., Robert, D., Wang, J., Hong, S., Wegner, J.D., Rupprecht, C., Schindler, K.: Litept: Lighter yet stronger point transformer. arXiv preprint arXiv:2512.13689 (2025) 21
-
[63]
Zakka, K., Zeng, A., Lee, J., Song, S.: Form2fit: Learning shape priors for gener- alizable assembly from disassembly (2020) 1
work page 2020
-
[64]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zhai, X., Mustafa, B., Kolesnikov, A., Beyer, L.: Sigmoid loss for language im- age pre-training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11975–11986 (2023) 18
work page 2023
-
[65]
arXiv preprint arXiv:2510.23607 (2025) 19 36 N
Zhang, Y., Wu, X., Lao, Y., Wang, C., Tian, Z., Wang, N., Zhao, H.: Concerto: Joint 2d-3d self-supervised learning emerges spatial representations. arXiv preprint arXiv:2510.23607 (2025) 19 36 N. Lee et al
-
[66]
Uni3d: Ex- ploring unified 3d representation at scale,
Zhou, J., Wang, J., Ma, B., Liu, Y.S., Huang, T., Wang, X.: Uni3d: Exploring unified 3d representation at scale. arXiv preprint arXiv:2310.06773 (2023) 8, 11, 18, 21
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.