PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models
Pith reviewed 2026-05-19 13:22 UTC · model grok-4.3
The pith
PacTure packs multiple views into single images to generate consistent high-resolution PBR textures for 3D meshes from text more efficiently.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PacTure shows that packing multiple rendered views of a mesh into one composite image raises the usable resolution for each view during multi-view generation while preserving the spatial relationships the image model needs to keep textures coherent across the 3D surface. When this packing is combined with fine-grained control inside a next-scale autoregressive backbone, the same model can output multiple PBR channels together at lower total inference cost than sequential per-view or cross-attention methods.
What carries the argument
View packing: the arrangement of several camera views of the mesh into one composite 2D image so that nearby surface regions remain near each other in the layout, allowing any standard 2D generative model to reason about texture continuity without retraining or added cost.
If this is right
- Global texture consistency improves because the model processes multiple views in one forward pass and can reason about continuity across packed regions.
- Inference time drops because several views are handled in fewer model evaluations while each view still receives higher effective resolution.
- The method remains compatible with any existing 2D generative model without architectural changes or retraining.
- Multiple PBR properties such as albedo, normals, roughness, and metallic can be produced together through the same autoregressive steps.
Where Pith is reading between the lines
- The same packing idea could be tested on other multi-view tasks such as consistent lighting or material editing across a mesh.
- If the packing layout were made adaptive to mesh shape rather than fixed, it might reduce wasted space on highly irregular objects.
- Applying the technique to higher base resolutions could show whether the efficiency gain scales or whether packing density eventually limits quality.
Load-bearing premise
That packing views together keeps the spatial proximity the model needs for coherent generation and does not create new boundary artifacts or force the model to be retrained.
What would settle it
Generate textures for a simple closed mesh such as a sphere using a single text prompt, unwrap the result, and inspect whether seams or color shifts appear exactly along the boundaries where the packed views meet on the 3D surface.
Figures
read the original abstract
We present PacTure, a novel framework for generating physically-based rendering (PBR) material textures for an untextured 3D mesh from a text description. Existing 2D generation-based texturing approaches either generate textures sequentially from different views, resulting in long inference times and globally inconsistent textures, or adopt multi-view generation with cross-view attention to enhance global consistency, which, however, limits the resolution for each view. In response to these weaknesses, we first introduce view packing, a novel technique that significantly increases the effective resolution for each view during multi-view generation, without imposing additional inference cost. Unlike UV mapping, it preserves the spatial proximity essential for image generation and maintains full compatibility with current 2D generative models. To further reduce the inferencing cost, we enable fine-grained control and multi-domain generation within the next-scale prediction autoregressive framework, creating an efficient multi-view PBR generation backbone. Extensive experiments show that PacTure outperforms state-of-the-art methods in both quality and efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PacTure, a framework for text-driven PBR texture generation on untextured 3D meshes. It proposes view packing to increase per-view resolution in multi-view generation while remaining compatible with existing 2D autoregressive models, combined with fine-grained control in a next-scale prediction backbone for efficient multi-domain (albedo, normal, roughness, metallic) output. The central claim is that this yields higher quality and lower inference cost than prior sequential or cross-attention multi-view methods.
Significance. If the view-packing operator demonstrably avoids boundary artifacts while delivering the claimed resolution and consistency gains, the work would offer a practical efficiency improvement for PBR texturing pipelines in computer graphics and 3D content creation.
major comments (2)
- [§3] §3 (View Packing): the claim that packing 'preserves the spatial proximity essential for image generation' and introduces no artifacts is load-bearing for the efficiency-without-retraining argument, yet the description provides no explicit operator definition or boundary-handling mechanism; without this, it is unclear whether seams are mitigated or merely assumed harmless for the autoregressive backbone.
- [§4] §4 / Table 2 (Experiments): the abstract and results section assert outperformance in quality and efficiency, but no quantitative values (e.g., FID, LPIPS, cross-view consistency, or seam-visibility scores) or ablation on packing density are referenced; this prevents verification that the packing operator actually resolves the spatial-coherence risk raised by the stress-test note.
minor comments (2)
- [§3.1] Clarify the precise packing layout (e.g., grid size, padding strategy) and its compatibility guarantees with the specific autoregressive tokenizer used.
- [§2] Add a short paragraph contrasting view packing with UV-based alternatives in terms of inference cost and model compatibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work. We provide detailed responses to each major comment and outline the revisions we will make to address the concerns about the view packing operator and experimental quantification.
read point-by-point responses
-
Referee: [§3] §3 (View Packing): the claim that packing 'preserves the spatial proximity essential for image generation' and introduces no artifacts is load-bearing for the efficiency-without-retraining argument, yet the description provides no explicit operator definition or boundary-handling mechanism; without this, it is unclear whether seams are mitigated or merely assumed harmless for the autoregressive backbone.
Authors: We are grateful to the referee for this insightful comment. The view packing technique is central to our approach, and we acknowledge that the current description in Section 3 could benefit from greater precision. We will revise the manuscript to include an explicit definition of the packing operator, specifying how multiple views are arranged in a grid within a single canvas while maintaining their relative spatial positions. Regarding boundary handling, we will detail the use of zero-padding or edge-aware blending to minimize potential seam artifacts, ensuring that the autoregressive model's next-scale prediction does not suffer from discontinuities. This addition will clarify that the method avoids introducing harmful artifacts and supports the efficiency claims without requiring model retraining. revision: yes
-
Referee: [§4] §4 / Table 2 (Experiments): the abstract and results section assert outperformance in quality and efficiency, but no quantitative values (e.g., FID, LPIPS, cross-view consistency, or seam-visibility scores) or ablation on packing density are referenced; this prevents verification that the packing operator actually resolves the spatial-coherence risk raised by the stress-test note.
Authors: We thank the referee for noting this gap in the presentation of results. Although our experiments in Section 4 and Table 2 compare PacTure against baselines in terms of visual quality and inference speed, we agree that incorporating standard quantitative metrics would enhance verifiability. In the revised version, we will report specific scores including FID and LPIPS for texture quality, as well as metrics for cross-view consistency and seam visibility. We will also add an ablation experiment on different packing densities to demonstrate how it affects spatial coherence and mitigates the risks highlighted in the stress-test. These changes will provide stronger evidence for the superiority of our method in both quality and efficiency. revision: yes
Circularity Check
No significant circularity; core contributions are independent techniques
full rationale
The paper presents view packing and fine-grained autoregressive control as novel, independent techniques that increase effective resolution and reduce inference cost while preserving compatibility with existing 2D models. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The claimed improvements in quality and efficiency for PBR texture generation do not reduce by construction to prior inputs; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formulate the arrangement of multi-view maps as a 2D rectangle bin packing problem... employs binary search to determine the maximal enlargement ratio
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
next-scale prediction autoregressive framework... multi-domain generation within the VAR T2I model
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
TEXTure: Text-guided texturing of 3D shapes,
E. Richardson, G. Metzer, Y . Alaluf, R. Giryes, and D. Cohen-Or, “TEXTure: Text-guided texturing of 3D shapes,” inProc. of the ACM SIGGRAPH Conference and Exhibition On Computer Graphics and Interactive Techniques (SIGGRAPH), 2023
work page 2023
-
[3]
Text2Tex: Text-driven texture synthesis via diffusion models,
D. Z. Chen, Y . Siddiqui, H. Lee, S. Tulyakov, and M. Nießner, “Text2Tex: Text-driven texture synthesis via diffusion models,” inProc. of IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[4]
Paint3D: Paint anything 3D with lighting-less texture diffusion models,
X. Zeng, X. Chen, Z. Qi, W. Liu, Z. Zhao, Z. Wang, B. Fu, Y . Liu, and G. Yu, “Paint3D: Paint anything 3D with lighting-less texture diffusion models,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[5]
TexFusion: Synthesizing 3D textures with text-guided image diffusion models,
T. Cao, K. Kreis, S. Fidler, N. Sharp, and K. Yin, “TexFusion: Synthesizing 3D textures with text-guided image diffusion models,” inProc. of IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[6]
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Z. Zhao, Z. Lai, Q. Lin, Y . Zhao, H. Liu, S. Yang, Y . Feng, M. Yang, S. Zhang, X. Yang, H. Shi, S. Liu, J. Wu, Y . Lian, F. Yang, R. Tang, Z. He, X. Wang, J. Liu, X. Zuo, Z. Chen, B. Lei, H. Weng, J. Xu, Y . Zhu, X. Liu, L. Xu, C. Hu, T. Huang, L. Wang, J. Zhang, M. Chen, L. Dong, Y . Jia, Y . Cai, J. Yu, Y . Tang, H. Zhang, Z. Ye, P. He, R. Wu, C. Zhan...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
GenesisTex: Adapting image denoising diffusion to texture space,
C. Gao, B. Jiang, X. Li, Y . Zhang, and Q. Yu, “GenesisTex: Adapting image denoising diffusion to texture space,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[8]
InTeX: Interactive text-to-texture synthesis via unified depth-aware inpainting,
J. Tang, R. Lu, X. Chen, X. Wen, G. Zeng, and Z. Liu, “InTeX: Interactive text-to-texture synthesis via unified depth-aware inpainting,”arXiv preprint arXiv:2403.11878, 2024
-
[9]
Meta 3D TextureGen: Fast and consistent texture generation for 3D objects,
R. Bensadoun, Y . Kleiman, I. Azuri, O. Harosh, A. Vedaldi, N. Neverova, and O. Gafni, “Meta 3D TextureGen: Fast and consistent texture generation for 3D objects,”arXiv preprint arXiv:2407.02430, 2024
-
[10]
Meta 3D AssetGen: Text-to-mesh generation with high- quality geometry, texture, and PBR materials,
Y . Siddiqui, T. Monnier, F. Kokkinos, M. Kariya, Y . Kleiman, E. Garreau, O. Gafni, N. Neverova, A. Vedaldi, R. Shapovalov, and D. Novotný, “Meta 3D AssetGen: Text-to-mesh generation with high- quality geometry, texture, and PBR materials,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[11]
VCD-Texture: Variance alignment based 3D-2D co-denoising for text-guided texturing,
S. Liu, C. Yu, C. Cao, W. Qian, and F. Wang, “VCD-Texture: Variance alignment based 3D-2D co-denoising for text-guided texturing,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[12]
CLAY: A controllable large-scale generative model for creating high-quality 3D assets,
L. Zhang, Z. Wang, Q. Zhang, Q. Qiu, A. Pang, H. Jiang, W. Yang, L. Xu, and J. Yu, “CLAY: A controllable large-scale generative model for creating high-quality 3D assets,”ACM Transactions on Graphics (TOG), 2024
work page 2024
-
[13]
TexGen: Text-guided 3D texture generation with multi-view sampling and resampling,
D. Huo, Z. Guo, X. Zuo, Z. Shi, J. Lu, P. Dai, S. Xu, L. Cheng, and Y . Yang, “TexGen: Text-guided 3D texture generation with multi-view sampling and resampling,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[14]
Make-A- Texture: Fast shape-aware texture generation in 3 seconds,
X. Xiang, L. S. Gorelik, Y . Fan, O. Armstrong, F. N. Iandola, Y . Li, I. Lifshitz, and R. Ranjan, “Make-A- Texture: Fast shape-aware texture generation in 3 seconds,” inProc. of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
work page 2025
-
[15]
Jointly generating multi-view consistent PBR textures using collaborative control,
S. Vainer, K. Kutsy, D. D. Nigris, C. Rowles, S. Elizarov, and S. Donné, “Jointly generating multi-view consistent PBR textures using collaborative control,”arXiv preprint arXiv:2410.06985, 2024
-
[16]
MVPaint: Synchronized multi-view diffusion for painting anything 3D,
W. Cheng, J. Mu, X. Zeng, X. Chen, A. Pang, C. Zhang, Z. Wang, B. Fu, G. Yu, Z. Liu, and L. Pan, “MVPaint: Synchronized multi-view diffusion for painting anything 3D,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[17]
MCMat: Multiview-consistent and physically accurate PBR material generation,
S. Zhu, L. Qiu, X. Gu, Z. Zhao, C. Xu, Y . He, Z. Li, X. Han, Y . Yao, X. Cao, S. Zhu, W. Yuan, Z. Dong, and H. Zhu, “MCMat: Multiview-consistent and physically accurate PBR material generation,”arXiv preprint arXiv:2412.14148, 2024
-
[18]
TexPainter: Generative mesh texturing with multi-view consistency,
H. Zhang, Z. Pan, C. Zhang, L. Zhu, and X. Gao, “TexPainter: Generative mesh texturing with multi-view consistency,” inProc. of the ACM SIGGRAPH Conference and Exhibition On Computer Graphics and Interactive Techniques (SIGGRAPH), 2024
work page 2024
-
[19]
Pan- dora3D: A comprehensive framework for high-quality 3D shape and texture generation,
J. Yang, T. Shang, W. Sun, X. Song, Z. Cheng, S. Wang, S. Chen, W. Liu, H. Li, and P. Ji, “Pan- dora3D: A comprehensive framework for high-quality 3D shape and texture generation,”arXiv preprint arXiv:2502.14247, 2025
-
[20]
Text-guided texturing by synchronized multi-view diffusion,
Y . Liu, M. Xie, H. Liu, and T. Wong, “Text-guided texturing by synchronized multi-view diffusion,” inProc. of the ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
work page 2024
-
[21]
Texture generation on 3D meshes with Point-UV diffusion,
X. Yu, P. Dai, W. Li, L. Ma, Z. Liu, and X. Qi, “Texture generation on 3D meshes with Point-UV diffusion,” inProc. of IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[22]
TEXGen: a generative diffusion model for mesh textures,
X. Yu, Z. Yuan, Y . Guo, Y . Liu, J. Liu, Y . Li, Y . Cao, D. Liang, and X. Qi, “TEXGen: a generative diffusion model for mesh textures,”ACM Transactions on Graphics (TOG), 2024
work page 2024
-
[23]
UV-free texture generation with denoising and geodesic heat diffusions,
S. Foti, S. Zafeiriou, and T. Birdal, “UV-free texture generation with denoising and geodesic heat diffusions,” arXiv preprint arXiv:2408.16762, 2024
-
[24]
TexOct: Generating textures of 3D models with octree-based diffusion,
J. Liu, C. Wu, X. Liu, X. Liu, J. Wu, H. Peng, C. Zhao, H. Feng, J. Liu, and E. Ding, “TexOct: Generating textures of 3D models with octree-based diffusion,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 11
work page 2024
-
[25]
TexGaussian: Generating high-quality PBR material via octree-based 3D Gaussian splatting,
B. Xiong, J. Liu, J. Hu, C. Wu, J. Wu, X. Liu, C. Zhao, E. Ding, and Z. Lian, “TexGaussian: Generating high-quality PBR material via octree-based 3D Gaussian splatting,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[26]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[27]
Scaling rectified flow transformers for high- resolution image synthesis,
P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach, “Scaling rectified flow transformers for high- resolution image synthesis,” inProc. of International Conference on Machine Learning (ICML), 2024
work page 2024
-
[28]
SDXL: improving latent diffusion models for high-resolution image synthesis,
D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “SDXL: improving latent diffusion models for high-resolution image synthesis,” inProc. of International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[29]
arXiv preprint arXiv:2309.15807 , year=
X. Dai, J. Hou, C. Ma, S. S. Tsai, J. Wang, R. Wang, P. Zhang, S. Vandenhende, X. Wang, A. Dubey, M. Yu, A. Kadian, F. Radenovic, D. Mahajan, K. Li, Y . Zhao, V . Petrovic, M. K. Singh, S. Motwani, Y . Wen, Y . Song, R. Sumbaly, V . Ramanathan, Z. He, P. Vajda, and D. Parikh, “Emu: Enhancing image generation models using photogenic needles in a haystack,”...
-
[30]
FlashTex: Fast relightable mesh texturing with LightControlNet,
K. Deng, T. Omernick, A. Weiss, D. Ramanan, J. Zhu, T. Zhou, and M. Agrawala, “FlashTex: Fast relightable mesh texturing with LightControlNet,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[31]
TexDreamer: Towards zero-shot high-fidelity 3D human texture generation,
Y . Liu, J. Zhu, J. Tang, S. Zhang, J. Zhang, W. Cao, C. Wang, Y . Wu, and D. Huang, “TexDreamer: Towards zero-shot high-fidelity 3D human texture generation,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[32]
Least squares conformal maps for automatic texture atlas generation,
B. Lévy, S. Petitjean, N. Ray, and J. Maillot, “Least squares conformal maps for automatic texture atlas generation,”ACM Transactions on Graphics (TOG), vol. 21, no. 3, pp. 362–371, 2002
work page 2002
-
[33]
Two-dimensional finite bin-packing algorithms,
J. O. Berkey and P. Y . Wang, “Two-dimensional finite bin-packing algorithms,”Journal of the Operational Research Society, 1987
work page 1987
-
[34]
A thousand ways to pack the bin – a practical approach to two-dimensional rectangle bin packing,
J. Jylänki, “A thousand ways to pack the bin – a practical approach to two-dimensional rectangle bin packing,” 2010
work page 2010
-
[35]
Visual autoregressive modeling: Scalable image generation via next-scale prediction,
K. Tian, Y . Jiang, Z. Yuan, B. Peng, and L. Wang, “Visual autoregressive modeling: Scalable image generation via next-scale prediction,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[36]
Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis,
J. Han, J. Liu, Y . Jiang, B. Yan, Y . Zhang, Z. Yuan, B. Peng, and X. Liu, “Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[37]
Pixart-Σ: Weak-to- strong training of diffusion transformer for 4K text-to-image generation,
J. Chen, C. Ge, E. Xie, Y . Wu, L. Yao, X. Ren, Z. Wang, P. Luo, H. Lu, and Z. Li, “Pixart-Σ: Weak-to- strong training of diffusion transformer for 4K text-to-image generation,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[38]
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
P. Sun, Y . Jiang, S. Chen, S. Zhang, B. Peng, P. Luo, and Z. Yuan, “Autoregressive model beats diffusion: Llama for scalable image generation,”arXiv preprint arXiv:2406.06525, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
HART: efficient visual generation with hybrid autoregressive transformer,
H. Tang, Y . Wu, S. Yang, E. Xie, J. Chen, J. Chen, Z. Zhang, H. Cai, Y . Lu, and S. Han, “HART: efficient visual generation with hybrid autoregressive transformer,”arXiv preprint arXiv:2410.10812, 2024
-
[40]
Text2Mesh: Text-driven neural stylization for meshes,
O. Michel, R. Bar-On, R. Liu, S. Benaim, and R. Hanocka, “Text2Mesh: Text-driven neural stylization for meshes,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[41]
Texturify: Generating textures on 3D shape surfaces,
Y . Siddiqui, J. Thies, F. Ma, Q. Shan, M. Nießner, and A. Dai, “Texturify: Generating textures on 3D shape surfaces,” inProc. of European Conference on Computer Vision (ECCV), 2022
work page 2022
-
[42]
GET3D: A generative model of high quality 3D textured shapes learned from images,
J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “GET3D: A generative model of high quality 3D textured shapes learned from images,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[43]
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models,
N. M. Khalid, T. Xie, E. Belilovsky, and T. Popa, “CLIP-Mesh: Generating textured meshes from text using pretrained image-text models,” inProc. of the ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2022. 12
work page 2022
-
[44]
Latent-NeRF for shape-guided generation of 3D shapes and textures,
G. Metzer, E. Richardson, O. Patashnik, R. Giryes, and D. Cohen-Or, “Latent-NeRF for shape-guided generation of 3D shapes and textures,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[45]
K. Youwang, T. Oh, and G. Pons-Moll, “Paint-it: Text-to-texture synthesis via deep convolutional texture map optimization and physically-based rendering,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[46]
TextureDreamer: Image-guided texture synthesis through geometry-aware diffusion,
Y . Yeh, J. Huang, C. Kim, L. Xiao, T. Nguyen-Phuoc, N. Khan, C. Zhang, M. Chandraker, C. S. Marshall, Z. Dong, and Z. Li, “TextureDreamer: Image-guided texture synthesis through geometry-aware diffusion,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[47]
DreamMat: High-quality PBR material generation with geometry- and light-aware diffusion models,
Y . Zhang, Y . Liu, Z. Xie, L. Yang, Z. Liu, M. Yang, R. Zhang, Q. Kou, C. Lin, W. Wang, and X. Jin, “DreamMat: High-quality PBR material generation with geometry- and light-aware diffusion models,” ACM Transactions on Graphics (TOG), 2024
work page 2024
-
[48]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inProc. of International Conference on Machine Learning (ICML), 2021
work page 2021
-
[49]
Generative Adversarial Networks
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y . Bengio, “Generative adversarial networks,”arXiv preprint arXiv:1406.2661, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[50]
DreamFusion: Text-to-3D using 2D diffusion,
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “DreamFusion: Text-to-3D using 2D diffusion,” inProc. of International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[51]
Adding conditional control to text-to-image diffusion models,
L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proc. of IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[52]
MatAtlas: Text-driven consistent geometry texturing and material assignment,
D. Ceylan, V . Deschaintre, T. Groueix, R. Martin, C. P. Huang, R. Rouffet, V . G. Kim, and G. Las- sagne, “MatAtlas: Text-driven consistent geometry texturing and material assignment,”arXiv preprint arXiv:2404.02899, 2024
-
[53]
Y . Fang, Z. Sun, T. Wu, J. Wang, Z. Liu, G. Wetzstein, and D. Lin, “Make-it-Real: Unleashing large multi- modal model’s ability for painting 3D objects with realistic materials,”arXiv preprint arXiv:2404.16829, 2024
-
[54]
MaterialSeg3D: Segmenting dense materials from 2D priors for 3D assets,
Z. Li, R. Gan, C. Luo, Y . Wang, J. Liu, Z. Zhu, Q. Li, X. Yin, M. Zhang, Z. Zhang, and J. Peng, “MaterialSeg3D: Segmenting dense materials from 2D priors for 3D assets,” inProc. of ACM International Conference on Multimedia (ACM MM), 2024
work page 2024
-
[55]
MaPa: Text-driven photorealistic material painting for 3D shapes,
S. Zhang, S. Peng, T. Xu, Y . Yang, T. Chen, N. Xue, Y . Shen, H. Bao, R. Hu, and X. Zhou, “MaPa: Text-driven photorealistic material painting for 3D shapes,” inProc. of the ACM SIGGRAPH Conference and Exhibition On Computer Graphics and Interactive Techniques (SIGGRAPH), 2024
work page 2024
-
[56]
U-Net: Convolutional networks for biomedical image seg- mentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image seg- mentation,” inProc. of International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015
work page 2015
-
[57]
C. Mou, X. Wang, L. Xie, Y . Wu, J. Zhang, Z. Qi, and Y . Shan, “T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” inProc. of Association for the Advancement of Artificial Intelligence, 2024
work page 2024
-
[58]
ControlNeXt: Powerful and efficient control for image and video generation,
B. Peng, J. Wang, Y . Zhang, W. Li, M. Yang, and J. Jia, “ControlNeXt: Powerful and efficient control for image and video generation,”arXiv preprint arXiv:2408.06070, 2024
-
[59]
OminiControl: Minimal and universal control for diffusion transformer,
Z. Tan, S. Liu, X. Yang, Q. Xue, and X. Wang, “OminiControl: Minimal and universal control for diffusion transformer,”arXiv preprint arXiv:2411.15098, 2024
-
[60]
Uni-ControlNet: All-in-one control to text-to-image diffusion models,
S. Zhao, D. Chen, Y . Chen, J. Bao, S. Hao, L. Yuan, and K. K. Wong, “Uni-ControlNet: All-in-one control to text-to-image diffusion models,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[61]
UniControl: A unified diffusion model for controllable visual generation in the wild,
C. Qin, S. Zhang, N. Yu, Y . Feng, X. Yang, Y . Zhou, H. Wang, J. C. Niebles, C. Xiong, S. Savarese, S. Ermon, Y . Fu, and R. Xu, “UniControl: A unified diffusion model for controllable visual generation in the wild,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[62]
ControlNet++: Improving conditional controls with efficient consistency feedback,
M. Li, T. Yang, H. Kuang, J. Wu, Z. Wang, X. Xiao, and C. Chen, “ControlNet++: Improving conditional controls with efficient consistency feedback,” inProc. of European Conference on Computer Vision (ECCV), 2024. 13
work page 2024
-
[63]
ControlAR: Controllable image generation with autoregressive models,
Z. Li, T. Cheng, S. Chen, P. Sun, H. Shen, L. Ran, X. Chen, W. Liu, and X. Wang, “ControlAR: Controllable image generation with autoregressive models,”arXiv preprint arXiv:2410.02705, 2024
-
[64]
Dinov2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P. Huang, S. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jégou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervi...
work page 2024
-
[65]
J. Chen, Y . Wu, S. Luo, E. Xie, S. Paul, P. Luo, H. Zhao, and Z. Li, “Pixart-δ: Fast and controllable image generation with latent consistency models,”arXiv preprint arXiv:2401.05252, 2024
-
[66]
Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis,
J. Chen, J. Yu, C. Ge, L. Yao, E. Xie, Z. Wang, J. T. Kwok, P. Luo, H. Lu, and Z. Li, “Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis,” inProc. of International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[67]
arXiv preprint arXiv:2406.09750 (2024)
X. Li, K. Qiu, H. Chen, J. Kuen, Z. Lin, R. Singh, and B. Raj, “ControlV AR: Exploring controllable visual autoregressive modeling,”arXiv preprint arXiv:2406.09750, 2024
-
[68]
CAR: controllable autoregressive modeling for visual generation,
Z. Yao, J. Li, Y . Zhou, Y . Liu, X. Jiang, C. Wang, F. Zheng, Y . Zou, and L. Li, “CAR: controllable autoregressive modeling for visual generation,”arXiv preprint arXiv:2410.04671, 2024
-
[69]
Neural discrete representation learning,
A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[70]
IntrinsicAnything: Learning diffusion priors for inverse rendering under unknown illumination,
X. Chen, S. Peng, D. Yang, Y . Liu, B. Pan, C. Lv, and X. Zhou, “IntrinsicAnything: Learning diffusion priors for inverse rendering under unknown illumination,” inProc. of European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[71]
Kaolin: A pytorch library for accelerating 3D deep learning research,
K. M. Jatavallabhula, E. J. Smith, J. Lafleche, C. F. Tsang, A. Rozantsev, W. Chen, T. Xiang, R. Lebaredian, and S. Fidler, “Kaolin: A pytorch library for accelerating 3D deep learning research,”arXiv preprint arXiv:1911.05063, 2019
-
[72]
DRCT: saving image super-resolution away from information bottleneck,
C. Hsu, C. Lee, and Y . Chou, “DRCT: saving image super-resolution away from information bottleneck,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
work page 2024
- [73]
-
[74]
Zero-1-to-3: Zero-shot one image to 3D object,
R. Liu, R. Wu, B. V . Hoorick, P. Tokmakov, S. Zakharov, and C. V ondrick, “Zero-1-to-3: Zero-shot one image to 3D object,” inProc. of IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[75]
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su, “Zero123++: a single image to consistent multi-view diffusion base model,”arXiv preprint arXiv:2310.15110, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[76]
Objaverse: A universe of annotated 3D objects,
M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kem- bhavi, and A. Farhadi, “Objaverse: A universe of annotated 3D objects,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[77]
Objaverse-XL: A universe of 10M+ 3D objects,
M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V . V oleti, S. Y . Gadre, E. VanderBilt, A. Kembhavi, C. V ondrick, G. Gkioxari, K. Ehsani, L. Schmidt, and A. Farhadi, “Objaverse-XL: A universe of 10M+ 3D objects,” inProc. of Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[78]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “IP-Adapter: Text compatible image prompt adapter for text-to-image diffusion models,”arXiv preprint arXiv:2308.06721, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[79]
SciPy 1.0: fundamental algorithms for scientific computing in Python,
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright,et al., “SciPy 1.0: fundamental algorithms for scientific computing in Python,” Nature methods, 2020
work page 2020
-
[80]
Taming transformers for high-resolution image synthesis,
P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.