Recognition: 2 theorem links
· Lean TheoremVoxify3D: Pixel Art Meets Volumetric Rendering
Pith reviewed 2026-05-16 23:57 UTC · model grok-4.3
The pith
Voxify3D converts 3D meshes into voxel art by aligning straight-on pixel renders with CLIP patches and using differentiable palette quantization to keep semantics intact under discretization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Voxify3D bridges 3D mesh optimization with 2D pixel art supervision by integrating orthographic pixel art supervision that removes perspective distortion for precise voxel-pixel alignment, patch-based CLIP alignment that preserves semantics across discretization levels, and palette-constrained Gumbel-Softmax quantization that enables differentiable optimization over discrete color spaces with controllable palette strategies. This integration directly addresses semantic preservation under extreme discretization and pixel-art aesthetics through volumetric rendering while supporting end-to-end discrete optimization.
What carries the argument
The synergistic integration of orthographic pixel art supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, which together carry the argument by allowing gradient-based optimization of voxel grids from meshes while enforcing pixel-level aesthetics and semantic consistency.
If this is right
- Voxel outputs can be generated controllably at abstraction levels from 2-8 colors and 20x-50x resolutions while remaining recognizable.
- End-to-end training replaces separate post-processing steps for color quantization and geometry abstraction in voxel pipelines.
- The framework produces higher semantic alignment scores and stronger user preference than prior mesh-to-voxel methods across characters.
- Palette strategies become tunable parameters that directly influence the final discrete color coherence without retraining.
Where Pith is reading between the lines
- The same supervision stack could be tested on generating voxel versions of static environments or props to check if semantic preservation generalizes beyond characters.
- Combining this discretization approach with rigged 3D models might enable direct creation of animated voxel characters for games.
- The reliance on patch CLIP suggests the method could adapt to other extreme stylizations such as low-poly or sprite-based outputs from 3D sources.
Load-bearing premise
Straight-on orthographic renders plus patch-level CLIP matches will keep the original meaning and desired pixel-art look intact once the mesh is locked into a coarse grid and tiny color palette.
What would settle it
Run the method on standard test meshes such as a human character or simple object at 20x resolution with a 4-color palette and measure whether human viewers can still correctly identify the source object from the voxel output at rates above chance.
Figures
read the original abstract
Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with controllable palette strategies. This integration addresses fundamental challenges: semantic preservation under extreme discretization, pixel-art aesthetics through volumetric rendering, and end-to-end discrete optimization. Experiments show superior performance (37.12 CLIP-IQA, 77.90% user preference) across diverse characters and controllable abstraction (2-8 colors, 20x-50x resolutions). Project page: https://yichuanh.github.io/Voxify-3D/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. Voxify3D is a differentiable two-stage framework for converting 3D meshes into voxel art. It combines orthographic pixel art supervision to eliminate perspective distortion, patch-based CLIP alignment to preserve semantics under discretization, and palette-constrained Gumbel-Softmax quantization to enable end-to-end optimization over discrete color palettes. Experiments report superior results with 37.12 CLIP-IQA and 77.90% user preference across characters at 2-8 colors and 20x-50x resolutions.
Significance. If the central claims hold after proper validation, the work could advance automated voxel stylization for games and digital media by providing a practical pipeline that jointly handles geometric abstraction, semantic fidelity, and palette constraints. The explicit use of orthographic supervision and differentiable quantization offers a concrete route to controllable discrete 3D outputs.
major comments (2)
- [Abstract] Abstract: the reported superiority (37.12 CLIP-IQA, 77.90% user preference) is presented without any description of baselines, ablation studies, or the precise experimental protocol (view selection, number of test meshes, evaluation views). This information is load-bearing for the claim that the three-component integration outperforms prior art.
- [Abstract] Method description (abstract): the claim that orthographic + patch-CLIP supervision plus Gumbel-Softmax produces 3D-consistent semantics after discretization lacks an explicit cross-view consistency term. Nothing in the pipeline is shown to penalize a voxel assignment that is coherent only on the training orthographic axes but collapses or recolors under rotation, which directly undermines the semantic-preservation guarantee.
minor comments (1)
- [Abstract] Abstract: no statement on code, data, or model availability is provided despite the project page link.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed comments. We have revised the manuscript to address the concerns about experimental context in the abstract and to clarify the mechanisms supporting 3D semantic consistency. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported superiority (37.12 CLIP-IQA, 77.90% user preference) is presented without any description of baselines, ablation studies, or the precise experimental protocol (view selection, number of test meshes, evaluation views). This information is load-bearing for the claim that the three-component integration outperforms prior art.
Authors: We agree that the abstract would benefit from additional context to substantiate the reported metrics. In the revised version we have expanded the abstract to note that results are compared against prior voxel stylization and mesh-to-voxel methods, that the test set comprises 20 diverse character meshes, that evaluation uses six orthographic views per mesh, and that ablations isolating each of the three components appear in Section 4.2. The full protocol (view selection, palette sizes, and user-study details) is already described in Section 4.1; we now reference it explicitly from the abstract. revision: yes
-
Referee: [Abstract] Method description (abstract): the claim that orthographic + patch-CLIP supervision plus Gumbel-Softmax produces 3D-consistent semantics after discretization lacks an explicit cross-view consistency term. Nothing in the pipeline is shown to penalize a voxel assignment that is coherent only on the training orthographic axes but collapses or recolors under rotation, which directly undermines the semantic-preservation guarantee.
Authors: The shared 3D voxel grid is optimized jointly from multiple orthographic axes; because voxel colors and occupancies are viewpoint-independent, any assignment that is coherent on the supervised axes remains coherent under arbitrary rotation by construction. Volumetric rendering further enforces this property. We have added a short paragraph in the method section and a clarifying sentence in the abstract that makes this implicit consistency explicit. We also include new supplementary visualizations rendered from random viewpoints to demonstrate preservation of semantics and palette under rotation. An additional explicit consistency loss is not required for the current claims but could be explored in future work. revision: partial
Circularity Check
No circularity: independent optimization pipeline with no self-referential derivations
full rationale
The paper presents Voxify3D as a two-stage differentiable framework that combines orthographic pixel art supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization. No equations, loss formulations, or uniqueness theorems are shown that reduce claimed outputs to inputs by construction. Reported metrics (CLIP-IQA, user preference) are experimental results rather than predictions forced by fitted parameters or self-citations. The method description remains self-contained against external benchmarks with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
free parameters (2)
- palette size
- voxel resolution
axioms (2)
- standard math Gumbel-Softmax provides a differentiable approximation to discrete sampling
- domain assumption CLIP features on patches capture semantic content at multiple discretization levels
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Christoph Bader, Dominik Kolb, James C Weaver, Sunanda Sharma, Ahmed Hosny, Jo˜ao Costa, and Neri Oxman. Mak- ing data matter: V oxel printing for the digital fabrication of data across scales and domains.Science advances, 4(5): eaas8652, 2018. 3
work page 2018
-
[2]
Edify 3d: Scalable high-quality 3d asset generation.arXiv preprint arXiv:2411.07135, 2024
Maciej Bala, Yin Cui, Yifan Ding, Yunhao Ge, Zekun Hao, Jon Hasselgren, Jacob Huffman, Jingyi Jin, JP Lewis, Zhaoshuo Li, et al. Edify 3d: Scalable high-quality 3d asset generation.arXiv preprint arXiv:2411.07135, 2024. 3
-
[3]
Sd- πxl: Generating low-resolution quantized imagery via score dis- tillation
Alexandre Binninger and Olga Sorkine-Hornung. Sd- πxl: Generating low-resolution quantized imagery via score dis- tillation. InSIGGRAPH Asia Conference Papers, pages 1–12, 2024. 2, 3
work page 2024
-
[4]
Proxylessnas: Di- rect neural architecture search on target task and hardware
Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Di- rect neural architecture search on target task and hardware. InInternational Conference on Learning Representations (ICLR), 2019. 3
work page 2019
-
[5]
Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia- Ping Chen, and Chun-Yi Lee. Denoising likelihood score matching for conditional score-based data generation.arXiv preprint arXiv:2203.14206, 2022. 3
-
[6]
Tensorf: Tensorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean conference on computer vision, pages 333–350. Springer,
-
[7]
Bo-Yu Chen, Wei-Chen Chiu, and Yu-Lun Liu. Improv- ing robustness for joint optimization of camera pose and decomposed low-rank tensorial radiance fields. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 990–1000, 2024. 2
work page 2024
-
[8]
Clip-driven open-vocabulary 3d scene graph generation via cross-modality contrastive learning
Lianggangxu Chen, Xuejiao Wang, Jiale Lu, Shaohui Lin, Changbo Wang, and Gaoqi He. Clip-driven open-vocabulary 3d scene graph generation via cross-modality contrastive learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27863– 27873, 2024. 3
work page 2024
-
[9]
Q-dit: Accu- rate post-training quantization for diffusion transformers
Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, and Wenwu Zhu. Q-dit: Accu- rate post-training quantization for diffusion transformers. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 28306–28315, 2025. 3
work page 2025
-
[10]
Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation. InProceedings of the IEEE/CVF international conference on computer vision, pages 22246–22256, 2023. 3
work page 2023
-
[11]
Shihan Chen, Qingsong Yan, Yingjie Qu, Wang Gao, Junx- ing Yang, and Fei Deng. Ortho-nerf: generating a true dig- ital orthophoto map using the neural radiance field from unmanned aerial vehicle images.Geo-spatial Information Science, 28(2):741–760, 2025. 3
work page 2025
-
[12]
V oxelnext: Fully sparse voxelnet for 3d object detection and tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelnext: Fully sparse voxelnet for 3d object detection and tracking. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 21674–21683, 2023. 2
work page 2023
-
[13]
Stylecity: Large-scale 3d urban scenes stylization
Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, and Sai-Kit Yeung. Stylecity: Large-scale 3d urban scenes stylization. InEuropean conference on computer vision, pages 395–413. Springer, 2024. 3
work page 2024
-
[14]
Content-adaptive image downscaling
Sungjoon Choi and Munchurl Kim. Content-adaptive image downscaling. InICCV, pages 261–269, 2015. 2
work page 2015
-
[15]
3dstyleglip: Part-tailored text-guided 3d neural stylization
SeungJeh Chung, JooHyun Park, and HyeongYeop Kang. 3dstyleglip: Part-tailored text-guided 3d neural stylization. arXiv preprint arXiv:2404.02634, 2024. 3
-
[16]
David Coeurjolly, Pierre Gueth, and Jacques-Olivier Lachaud. Regularization of voxel art. InACM SIGGRAPH 2018 Talks, pages 1–2. 2018. 3
work page 2018
-
[17]
Generating pixel art character sprites using gans.arXiv preprint arXiv:2208.06413, 2022
Fl´avio Coutinho and Luiz Chaimowicz. Generating pixel art character sprites using gans.arXiv preprint arXiv:2208.06413, 2022. Submitted to SBGames 2022. 2
-
[18]
3d paintbrush: Local stylization of 3d shapes with cascaded score distillation
Dale Decatur, Itai Lang, Kfir Aberman, and Rana Hanocka. 3d paintbrush: Local stylization of 3d shapes with cascaded score distillation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4473–4483, 2024. 3
work page 2024
-
[19]
Taming transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bj¨orn Ommer. Taming transformers for high-resolution image synthesis. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12873–12883, 2021. 2, 3
work page 2021
-
[20]
Kevin Frans, Lisa Soros, and Olaf Witkowski. Clipdraw: Ex- ploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022. 3, 5
work page 2022
-
[21]
Plenoxels: Radiance fields without neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 2
work page 2022
-
[22]
Style- nerf2nerf: 3d style transfer from style-aligned multi-view images
Haruo Fujiwara, Yusuke Mukuta, and Tatsuya Harada. Style- nerf2nerf: 3d style transfer from style-aligned multi-view images. InSIGGRAPH Asia 2024 Conference Papers, pages 1–10, 2024. 3
work page 2024
-
[23]
Fastnerf: High-fidelity neural rendering at 200fps
Stephan J Garbin, Marek Kowalski, Matthew Johnson, Jamie Shotton, and Julien Valentin. Fastnerf: High-fidelity neural rendering at 200fps. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14346–14355, 2021. 2
work page 2021
-
[24]
Learn to create simple lego micro buildings.ACM Transactions on Graphics (TOG), 43(6):1–13, 2024
Jiahao Ge, Mingjun Zhou, and Chi-wing Fu. Learn to create simple lego micro buildings.ACM Transactions on Graphics (TOG), 43(6):1–13, 2024. 3
work page 2024
-
[25]
Timothy Gerstner, Doug DeCarlo, Marc Alexa, Adam Finkelstein, Yotam Gingold, and Andrew Nealen. Pixelated 9 image abstraction with integrated user constraints.Comput- ers & Graphics, 37(5):333–347, 2013. 2
work page 2013
-
[26]
Controllable neural style transfer for dynamic meshes
Guilherme Gomes Haetinger, Jingwei Tang, Raphael Ortiz, Paul Kanyuk, and Vinicius Azevedo. Controllable neural style transfer for dynamic meshes. InAcm siggraph 2024 conference papers, pages 1–11, 2024. 3
work page 2024
-
[27]
Gemini models: Product overview
Google DeepMind. Gemini models: Product overview. https : / / deepmind . google / technologies / gemini/, 2025. Accessed: November 21, 2025. 14, 16, 18
work page 2025
-
[28]
Chu Han, Qiang Wen, Shengfeng He, Qianshu Zhu, Yinjie Tan, Guoqiang Han, and Tien-Tsin Wong. Deep unsuper- vised pixelization.ACM Transactions on Graphics (SIG- GRAPH Asia 2018 issue), 37(6):243:1–243:11, 2018. 2
work page 2018
-
[29]
Deep unsuper- vised pixelization.ACM Transactions on Graphics (TOG), 37(6):1–11, 2018
Chu Han, Qiang Wen, Shengfeng He, Qianshu Zhu, Yinjie Tan, Guoqiang Han, and Tien-Tsin Wong. Deep unsuper- vised pixelization.ACM Transactions on Graphics (TOG), 37(6):1–11, 2018. 2
work page 2018
-
[30]
Efros, Aleksander Holynski, and Angjoo Kanazawa
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8874–8884, 2023. 2, 6, 7, 16
work page 2023
-
[31]
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Plankassembly: Robust 3d reconstruction from three orthographic views with learnt shape programs
Wentao Hu, Jia Zheng, Zixin Zhang, Xiaojun Yuan, Jian Yin, and Zihan Zhou. Plankassembly: Robust 3d reconstruction from three orthographic views with learnt shape programs. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18495–18505, 2023. 3
work page 2023
-
[33]
Quantart: Quantizing image style transfer towards high visual fidelity
Siyu Huang, Jie An, Donglai Wei, Jiebo Luo, and Hanspeter Pfister. Quantart: Quantizing image style transfer towards high visual fidelity. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 5947–5956, 2023. 3
work page 2023
-
[34]
Yuhang Huang, Shilong Zou, Xinwang Liu, and Kai Xu. Part-aware shape generation with latent 3d diffusion of neu- ral voxel fields.IEEE Transactions on Visualization and Computer Graphics, 2025. 3
work page 2025
-
[35]
Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M Rehg, and Varun Jampani. Spar3d: Stable point-aware recon- struction of 3d objects from single images.arXiv preprint arXiv:2501.04689, 2025. 3
-
[36]
Yuki Igarashi and Takeo Igarashi. Pixel art adaptation for handicraft fabrication.Computer Graphics Forum (Pacific Graphics 2022), 41(7):489–494, 2022. 2
work page 2022
-
[37]
Image-to-image translation with conditional adver- sarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adver- sarial networks. InCVPR, pages 1125–1134, 2017. 2
work page 2017
-
[38]
Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models.arXiv preprint arXiv:2211.13845, 2022. 2
-
[39]
Categorical repa- rameterization with gumbel-softmax
Eric Jang, Shixiang Gu, and Ben Poole. Categorical repa- rameterization with gumbel-softmax. InProceedings of the 5th International Conference on Learning Representations (ICLR), 2017. 3, 5
work page 2017
-
[40]
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution.arXiv preprint arXiv:1603.08155, 2016. 2
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[41]
Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, and Sanghyun Seo. Minecraft- ify: Minecraft style image generation with text-guided image editing for in-game application.arXiv preprint arXiv:2402.05448, 2024. 3
-
[42]
Jeong Joon Kim, Youngjung Hwang, Jaesung Park, Jae- jun Choi, and Taesup Kim. Diffusionclip: Text-guided im- age manipulation using diffusion models.arXiv preprint arXiv:2110.02711, 2022. 3
-
[43]
Palettenerf: Palette-based appearance editing of neural radiance fields
Zhengfei Kuang, Fujun Luan, Sai Bi, Zhixin Shu, Gordon Wetzstein, and Kalyan Sunkavalli. Palettenerf: Palette-based appearance editing of neural radiance fields. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20691–20700, 2023. 3
work page 2023
-
[44]
Ice-nerf: Interactive color editing of nerfs via decomposition-aware weight op- timization
Jae-Hyeok Lee and Dae-Shik Kim. Ice-nerf: Interactive color editing of nerfs via decomposition-aware weight op- timization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3491–3501, 2023. 3
work page 2023
-
[45]
V oxelizing google earth: A pipeline for new virtual worlds
Ryan Hardesty Lewis. V oxelizing google earth: A pipeline for new virtual worlds. InACM SIGGRAPH 2024 Labs, pages 1–2. 2024. 3
work page 2024
-
[46]
Compressing volumetric radiance fields to 1 mb
Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Liefeng Bo. Compressing volumetric radiance fields to 1 mb. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4222–4231, 2023. 2
work page 2023
-
[47]
Diffusion- sdf: Text-to-shape via voxelized diffusion
Muheng Li, Yueqi Duan, Jie Zhou, and Jiwen Lu. Diffusion- sdf: Text-to-shape via voxelized diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12642–12651, 2023. 3
work page 2023
-
[48]
Genrc: Generative 3d room completion from sparse image collections
Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert YC Chen, Cheng-Hao Kuo, and Min Sun. Genrc: Generative 3d room completion from sparse image collections. InEuropean Conference on Computer Vision, pages 146–163. Springer, 2024. 3
work page 2024
-
[49]
Yuxuan Li, Robin Rombach, Yiqin Zhang, Xiaohang Zhan, Wenqiang Xu, Patrick Esser, and Bj ¨orn Ommer. Blended latent diffusion: Text-driven editing of natural images.arXiv preprint arXiv:2301.11093, 2023. 3
-
[50]
V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion
Yiming Li, Zhiding Yu, Christopher Choy, Chaowei Xiao, Jose M Alvarez, Sanja Fidler, Chen Feng, and Anima Anand- kumar. V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9087–9098, 2023. 3
work page 2023
-
[51]
Svdtree: Semantic voxel diffusion for single image tree reconstruction
Yuan Li, Zhihao Liu, Bedrich Benes, Xiaopeng Zhang, and Jianwei Guo. Svdtree: Semantic voxel diffusion for single image tree reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4692–4702, 2024. 3
work page 2024
-
[52]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 3 10
work page 2023
-
[53]
Frugalnerf: Fast conver- gence for extreme few-shot novel view synthesis without learned priors
Chin-Yang Lin, Chung-Ho Wu, Chang-Han Yeh, Shih-Han Yen, Cheng Sun, and Yu-Lun Liu. Frugalnerf: Fast conver- gence for extreme few-shot novel view synthesis without learned priors. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11227–11238, 2025. 2
work page 2025
-
[54]
Darts: Differentiable architecture search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. InInternational Confer- ence on Learning Representations (ICLR), 2019. 3
work page 2019
-
[55]
Stylerf: Zero-shot 3d style transfer of neural radiance fields
Kunhao Liu, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, and Eric P Xing. Stylerf: Zero-shot 3d style transfer of neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8338– 8348, 2023. 2, 3
work page 2023
-
[56]
Editing neural radiance fields by scene operations
Lingjie Liu et al. Editing neural radiance fields by scene operations. InCVPR, 2022. 2
work page 2022
-
[57]
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36, 2024. 3
work page 2024
-
[58]
Richard Liu, Daniel Fu, Noah Tan, Itai Lang, and Rana Hanocka. Wir3d: Visually-informed and geometry-aware 3d shape abstraction.arXiv preprint arXiv:2505.04813, 2025. 3
-
[59]
Weihang Liu, Xue Xian Zheng, Jingyi Yu, and Xin Lou. Content-aware radiance fields: Aligning model complexity with scene intricacy through learned bitwidth quantization. InEuropean Conference on Computer Vision, pages 239–
-
[60]
Robust dynamic radiance fields
Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Jo- hannes Kopf, and Jia-Bin Huang. Robust dynamic radiance fields. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 13–23, 2023. 2
work page 2023
-
[61]
Material palette: Extraction of materials from a single image
Ivan Lopes, Fabio Pizzati, and Raoul de Charette. Material palette: Extraction of materials from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4379–4388, 2024. 3
work page 2024
-
[62]
Frank Losasso, Fr´ed´eric Gibou, and Ronald Fedkiw. Simu- lating water and smoke with an octree data structure.ACM Transactions on Graphics (TOG), 23(3):457–462, 2004. 3
work page 2004
-
[63]
Differentiable voxeliza- tion and mesh morphing.arXiv preprint arXiv:2407.11272,
Yihao Luo, Yikai Wang, Zhengrui Xiang, Yuliang Xiu, Guang Yang, and ChoonHwai Yap. Differentiable voxeliza- tion and mesh morphing.arXiv preprint arXiv:2407.11272,
-
[64]
The concrete distribution: A continuous relaxation of discrete random variables
Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. InProceedings of the 5th International Conference on Learning Representations (ICLR), 2017. 3, 5
work page 2017
-
[65]
Pulse: Self-supervised photo upsampling via latent space exploration of generative models
Sachin Menon, Alex Damian, Shijia Hu, Namkug Ravi, and Cynthia Rudin. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2437–2445, 2020. 3
work page 2020
-
[66]
Progressively optimized local radiance fields for robust view synthesis
Andreas Meuleman, Yu-Lun Liu, Chen Gao, Jia-Bin Huang, Changil Kim, Min H Kim, and Johannes Kopf. Progressively optimized local radiance fields for robust view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16539–16548, 2023. 2
work page 2023
-
[67]
Text2mesh: Text-driven neural stylization for meshes
Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. Text2mesh: Text-driven neural stylization for meshes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13492– 13502, 2022. 3
work page 2022
-
[68]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2
work page 2021
-
[69]
Shentong Mo, Enze Xie, Ruihang Chu, Lanqing Hong, Matthias Niessner, and Zhenguo Li. Dit-3d: Exploring plain diffusion transformers for 3d shape generation.Advances in neural information processing systems, 36:67960–67971,
-
[70]
Clipcap: Clip prefix for image captioning
Ron Mokady, Amir Hertz, and Amit H Bermano. Clipcap: Clip prefix for image captioning. InEuropean Conference on Computer Vision (ECCV), pages 531–547. Springer, 2022. 3
work page 2022
-
[71]
Instant neural graphics primitives with a multires- olution hash encoding
Thomas M¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding. InACM Transactions on Graphics (TOG), pages 102:1–102:15. ACM, 2022. 2
work page 2022
-
[72]
Ken Museth. Vdb: High-resolution sparse volumes with dynamic topology.ACM Transactions on Graphics (TOG), 32(3):1–22, 2013. 3
work page 2013
-
[73]
Styleclip: Text-driven manipulation of stylegan imagery.arXiv preprint arXiv:2103.17249, 2021
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. Styleclip: Text-driven manipulation of stylegan imagery.arXiv preprint arXiv:2103.17249, 2021. 3
-
[74]
Geoscaler: Geometry and rendering- aware downsampling of 3d mesh textures
Sai Karthikey Pentapati, Anshul Rai, Arkady Ten, Chaitanya Atluru, and Alan Bovik. Geoscaler: Geometry and rendering- aware downsampling of 3d mesh textures. In2025 IEEE International Conference on Image Processing (ICIP), pages 1007–1012. IEEE, 2025. 3
work page 2025
-
[75]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[76]
Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, and Jun-Yan Zhu. Generating physically stable and buildable lego designs from text.arXiv preprint arXiv:2505.05469, 2025. 3
-
[77]
Mod- ular procedural generation for voxel maps
Adarsh Pyarelal, Aditya Banerjee, and Kobus Barnard. Mod- ular procedural generation for voxel maps. InAAAI Fall Symposium, pages 85–101. Springer, 2021. 3
work page 2021
-
[78]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 3, 5
work page 2021
-
[79]
Kilonerf: Scalable neural radiance fields 11 with thousands of tiny mlps
Claudius Reiser, Gernot Riegler, Anton S Kaplanyan, and Marc Pollefeys. Kilonerf: Scalable neural radiance fields 11 with thousands of tiny mlps. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14325–14334, 2021. 2
work page 2021
-
[80]
Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4209–4219, 2024. 2, 3
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.