Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

Archer Moore; Liam Hodgkinson; Mingming Gong

arxiv: 2606.27305 · v1 · pith:SFJXDVRTnew · submitted 2026-06-25 · 💻 cs.CV

Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

Archer Moore , Mingming Gong , Liam Hodgkinson This is my paper

Pith reviewed 2026-06-26 05:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D GANNeRF densityhuman preferenceface geometryRLHFdensity consistencyEG3Dradiance field

0 comments

The pith

Human preferences on NeRF density fine-tune a 3D face GAN to produce geometries preferred in 74.4% of comparisons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a reward model can be trained on human preferences for the density values inside a neural radiance field and then used to fine-tune a pretrained 3D-aware face GAN. The reward operates directly on the continuous density field with no mesh conversion or external shape data required. A density-consistency constraint is applied during fine-tuning so that 2D appearance stays qualitatively similar while the 3D geometry changes. With preference samples from only one annotator the fine-tuned generator wins 74.4% of user pairwise comparisons on geometry, while FID rises from 4.09 to 6.66.

Core claim

Working on an unconditional 3D-aware face GAN, the authors learn a reward directly from the 3D density field of the NeRF and apply it during fine-tuning under a density-consistency constraint. This produces face geometries preferred by users in 74.4% of pairwise comparisons without text conditioning, mesh extraction, or multi-view rendering, at a bounded distributional cost.

What carries the argument

A reward model that reads the continuous 3D density field of the NeRF directly to supply a geometry-only learning signal, together with a density-consistency constraint that preserves 2D appearance during reshaping.

If this is right

Geometry can be reshaped directly inside the radiance field without converting to meshes or supplying external shape priors.
2D appearance remains close enough that FID increases only from 4.09 to 6.66.
A small preference set collected from a single annotator is enough to train an effective reward model.
The same pipeline demonstrates RLHF applied to 3D-aware GANs without multi-view rendering or surface supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The density-based reward could be tested on other unconditional 3D generators to see if the approach generalizes beyond faces.
Skipping mesh extraction steps may reduce compute in future preference-tuning pipelines for 3D models.
Very small numbers of annotations per user could enable practical personalization of 3D face generators.
The bounded FID cost suggests the method can be combined with other signals if density consistency proves robust across domains.

Load-bearing premise

The density-consistency constraint keeps the 2D appearance qualitatively similar while the geometry is reshaped.

What would settle it

A larger user study that measures whether the 74.4% preference rate holds across multiple annotators or drops when the density-consistency constraint is ablated and appearance changes become visible.

Figures

Figures reproduced from arXiv: 2606.27305 by Archer Moore, Liam Hodgkinson, Mingming Gong.

**Figure 2.** Figure 2: Geometry and appearance before (left) and after (right) fine-tuning with human feedback, for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reward-input slab used for 𝜎𝑋𝑌 𝑍 scoring. The top row shows the crop box inside the full 2563 EG3D sigma cube on three orthogonal slices; the bottom row shows the resulting cropped tensor of shape 128 × 141 × 129. After cropping, each slab is independently rescaled to [0, 100] by normalise_sigma_self. 3.2 Learning a model of 3D shape quality from preference pairs Prior work has shown that a model of 3D sha… view at source ↗

**Figure 4.** Figure 4: Sigma field 𝜎𝑋𝑌 𝑍 visualised from three rotated view angles. Each panel shows the density field of a single EG3D-FFHQ sample, rendered as a volume with the cube outline visible. geometric prior than a radiance field, in which the geometry is only implicit: a mesh surface is explicit, so defects such as an open (unbounded) surface can be characterised and repaired directly, and many methods exist for recons… view at source ↗

**Figure 5.** Figure 5: The reward model 𝑟 𝜃 predicts a quality score 𝑠 from 3D representation 𝑥3𝐷. The module 𝑁 is a domain-specific feature extractor mapping 𝑥3𝐷 to a global feature ®𝑓 . An MLP decodes ®𝑓 into the quality score. Reward-model training loss. The reward-model training loss L𝜃 = L𝑤 is the pairwise prediction loss in Equation (4), which encourages 𝑟 𝜃 to predict the winning example 𝑥𝑤 over ranked pairs in a minibatc… view at source ↗

**Figure 6.** Figure 6: Modification of the generator update step in the GAN loss. The reward model [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Change in 𝜎𝑋𝑌 𝑍 reward distribution after fine-tuning, on 100 paired latent codes at truncation 𝜓 = 0.7. Left: histograms of reward scores before (orig, blue) and after (tuned, red) fine-tuning, with dashed lines marking the means. Right: distribution of per-seed deltas 𝑟 𝜃 (𝐺𝑟 ∗ 𝜃 (𝑧)) − 𝑟 𝜃 (𝐺(𝑧)). All 100/100 deltas are positive with mean +12.89. 4.2.1 External user study Improvements to 3D shape are ev… view at source ↗

**Figure 8.** Figure 8: Per-seed 𝜎𝑋𝑌 𝑍 reward trajectories during fine-tuning, for 200 fixed latent codes, each line coloured by its initial reward score. Left: with the reward loss (𝜆𝑟 = 10) the reward rises and saturates for essentially every seed. Right: the matched no-reward control (𝜆𝑟 = 0) shows no systematic reward change. The mean per-seed reward increase is large under the reward loss and approximately zero for the contr… view at source ↗

**Figure 9.** Figure 9: Final vs. initial 𝜎𝑋𝑌 𝑍 reward for 200 fixed latent codes; the dashed line is 𝑦 =𝑥 (no change). Left: with the reward loss (𝜆𝑟 = 10) every code lies well above 𝑦=𝑥 and the fit is nearly flat (𝑏 = 0.33), so the final reward is almost independent of the starting quality - the reward compresses the distribution toward a common high-quality level while improving all codes. Right: the no-reward control (𝜆𝑟 = 0)… view at source ↗

**Figure 10.** Figure 10: Change in geometry after fine-tuning for three fixed latent codes (seeds [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: UMAP projections of the reward model’s 8,192-d global feature. a Same-𝑧 pairs from 𝐺 (orig) and 𝐺𝑟 ∗ 𝜃 (tuned), colour-coded by model; the two populations are cleanly separated under the learned feature. b 100 samples from the untuned generator 𝐺, colour-coded by 𝜎𝑋𝑌 𝑍 reward score; reward varies smoothly and monotonically across the embedding (Spearman 𝜌 = −0.93 between reward and the principal UMAP axis… view at source ↗

**Figure 12.** Figure 12: Visualisation of the contributions of facial regions to quality scores in depth-map and point [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Geometries ranked highest a and lowest b by the PointNet point-cloud reward, sampled from the EG3D generator after PointNet-reward fine-tuning. The top-ranked - i.e. preferred - geometries still include clearly defective shapes such as over-sharp noses and irregular surfaces, consistent with the model’s near-chance within-distribution accuracy ( [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Geometry of the fine-tuned EG3D generator on the same seeds, at the same checkpoint, under [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Mean Shapley (left) and Integrated Gradients (right) contribution per region for the [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

**Figure 16.** Figure 16: Marching-cubes mesh from a representative PanoHead sample (seed [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗

**Figure 17.** Figure 17: Distribution of positive 𝜎 density across the five generators (canonical 𝜓 = 0.7, log–log axes). The geometry-bearing high-𝜎 tail occupies a markedly different numerical range per architecture: mean per-seed maximum 𝜎 of roughly 250 (EG3D-orig), 430 (EG3D-tuned), 810 (PanoHead and SphereHead) and 8,200 (HyPlaneHead). A reward model trained on EG3D-FFHQ’s 𝜎 statistics is consequently out-of-distribution on… view at source ↗

**Figure 18.** Figure 18: Within-generator rank consistency of the [PITH_FULL_IMAGE:figures/full_fig_p026_18.png] view at source ↗

**Figure 19.** Figure 19: Canonical-view facial depth diagnostics across the five generators. Top row: mean ray [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗

**Figure 20.** Figure 20: Unstratified top-5 (upper strip) vs bottom-5 (lower strip) mesh tails by 𝜎𝑋𝑌 𝑍 reward, one row of strips per generator (panels (a)–(e)). 𝜎 is sampled at 5123 in memory per seed and marchingcubes-extracted at level 10. On the two EG3D generators (in-domain for the reward) the top/bottom meshes differ visibly in surface integrity; on the three 360◦ generators (out-of-domain for the reward) the top/bottom m… view at source ↗

**Figure 21.** Figure 21: Single-image PTI inversion of SphereHead with EG3D-reward guidance at increasing weight [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

read the original abstract

Reinforcement learning from human feedback (RLHF) for 3D generation is now established across a number of works, but most existing pipelines optimise explicit surface representations, often by converting radiance fields into meshes and training heavily on surface-supervised data. We instead fine-tune a pretrained 3D-aware generative model directly from a learned reward over radiance-field density ($\sigma$) values, with no externally supplied mesh or shape prior. The reward model requires no pretraining, trains easily on a small set of preference samples, and yields robust improvement in 3D geometry. Working on an unconditional 3D-aware face GAN (EG3D), our reward reads the continuous 3D density field of the neural radiance field (NeRF) directly and supplies a geometry-only learning signal, requiring neither text conditioning, mesh extraction, nor multi-view rendering. A density-consistency constraint keeps the 2D appearance qualitatively similar while the geometry is reshaped, at a measurable but bounded distributional cost (FID-50k rises from 4.09 to 6.66): the fine-tuned generator, trained from the preferences of a single annotator as a proof of concept, produces face geometries preferred by users in 74.4% of pairwise comparisons.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a density-reward fine-tune on EG3D that reaches 74.4% user preference without mesh steps, but the abstract supplies almost no experimental controls or geometry-specific checks.

read the letter

The core result is that a reward trained directly on NeRF density values from one annotator's preferences can reshape face geometry in a pretrained EG3D model, producing outputs users pick 74.4% of the time over the base model while FID only moves from 4.09 to 6.66.

What is actually new is the decision to skip mesh extraction and surface supervision entirely and apply the reward to the continuous sigma field. That removes a common pipeline step and keeps the method lightweight.

The user-study win is a concrete data point and the density-consistency constraint appears to limit appearance drift enough for the FID number to stay reasonable. Those are the parts that work.

The soft spots are the missing pieces. No architecture or training details for the reward model are given, no statistical tests on the preference rate, and the single-annotator setup is presented as a proof of concept without further validation. The stress-test point holds: without depth-map consistency, normal variance, or multi-view geometry metrics, the preferences could be driven by correlated appearance shifts rather than the intended sigma signal. The abstract does not address that directly.

This is for people already working on 3D-aware GANs or preference tuning who want a simple starting point to experiment with. A reader could try the approach on their own setup, but the current write-up does not give enough to reproduce or extend the geometry claim.

I would send it to peer review because the departure from mesh-based RLHF is clear and the empirical number is there, even though the methods and controls need substantial expansion.

Referee Report

2 major / 1 minor

Summary. The paper presents a method for fine-tuning a pretrained 3D-aware face GAN (EG3D) via RLHF applied directly to the NeRF density field σ, using a reward model trained on human preference samples without mesh extraction or multi-view rendering. A density-consistency constraint is introduced to preserve 2D appearance during geometry reshaping. As a single-annotator proof of concept, the approach yields a 74.4% user preference win rate in pairwise comparisons, with FID-50k rising from 4.09 to 6.66.

Significance. If the central empirical result holds under proper controls, the work demonstrates a lightweight, mesh-free route to geometry improvement in unconditional 3D GANs by optimizing the density field from preferences. The absence of external shape priors and the direct use of the σ-field constitute a clear methodological distinction from mesh-conversion pipelines; the reported FID bound and user-study win rate provide a concrete, falsifiable benchmark.

major comments (2)

[Abstract] Abstract, final paragraph: the claim that the density-consistency constraint 'keeps the 2D appearance qualitatively similar while the geometry is reshaped' is load-bearing for attributing the 74.4% preference gain to geometry rather than correlated appearance shifts, yet no geometry-specific verification (multi-view depth/normal consistency, surface-normal variance, or rendered depth-map comparisons) is described; the moderate FID rise alone does not rule out appearance leakage.
[Abstract] Abstract: the 74.4% pairwise preference result is presented without any description of the reward-model architecture, training procedure, number of preference samples, statistical testing, or how single-annotator labels were collected and validated; these omissions make the central user-study claim impossible to assess for robustness.

minor comments (1)

The manuscript would benefit from an explicit equation or pseudocode block defining the density-consistency loss term and its weighting relative to the preference reward.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to improve clarity and support for the central claims.

read point-by-point responses

Referee: [Abstract] Abstract, final paragraph: the claim that the density-consistency constraint 'keeps the 2D appearance qualitatively similar while the geometry is reshaped' is load-bearing for attributing the 74.4% preference gain to geometry rather than correlated appearance shifts, yet no geometry-specific verification (multi-view depth/normal consistency, surface-normal variance, or rendered depth-map comparisons) is described; the moderate FID rise alone does not rule out appearance leakage.

Authors: We agree that the abstract's claim would be strengthened by explicit geometry-specific metrics beyond FID. The density-consistency constraint operates directly on the σ-field to limit appearance drift, but the current manuscript does not report multi-view depth or normal consistency checks. In revision we will add quantitative comparisons of rendered depth maps and surface normals across views in the experiments section, and update the abstract to reference these results. This directly addresses the concern about potential appearance leakage. revision: yes
Referee: [Abstract] Abstract: the 74.4% pairwise preference result is presented without any description of the reward-model architecture, training procedure, number of preference samples, statistical testing, or how single-annotator labels were collected and validated; these omissions make the central user-study claim impossible to assess for robustness.

Authors: The abstract summarizes the result as a single-annotator proof of concept. The full manuscript describes the reward model (a lightweight MLP reading σ values), the collection of preference pairs, and the training procedure in Section 3. However, we acknowledge that the abstract itself lacks these specifics and that statistical testing details (e.g., binomial test p-value) are not highlighted. We will expand the abstract with a concise description of the reward model, sample count, and validation approach, and ensure the main text includes explicit statistical analysis of the 74.4% win rate. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical fine-tuning with external human evaluation

full rationale

The paper describes an empirical RLHF-style fine-tuning procedure on a pretrained EG3D model, using a reward derived from human preference samples over NeRF density values and a density-consistency constraint. Performance is assessed via external user pairwise comparisons (74.4% preference) and FID metric. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains are present in the provided text. The central claim reduces to measured outcomes against independent human judgments rather than any internal reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the reward model and density-consistency constraint are treated as black-box components whose internal structure is not described.

pith-pipeline@v0.9.1-grok · 5758 in / 1153 out tokens · 28490 ms · 2026-06-26T05:31:40.870749+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 6 canonical work pages

[1]

Ogras, and Linjie Luo

Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, and Linjie Luo. PanoHead: Geometry-aware 3D full-head synthesis in 360 degrees. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2023
[2]

Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein

Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis.arXiv preprint arXiv:2012.00926, 2020

arXiv 2012
[3]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3D generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022
[4]

Beyond prompts: Unconditional 3D inversion for out-of-distribution shapes.arXiv preprint arXiv:2604.14914, 2026

Victoria Yue Chen, Emery Pierson, Léopold Maillard, and Maks Ovsjanikov. Beyond prompts: Unconditional 3D inversion for out-of-distribution shapes.arXiv preprint arXiv:2604.14914, 2026

Pith/arXiv arXiv 2026
[5]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, 2017

2017
[6]

Lienkamp, Thomas Brox, and Olaf Ronneberger

Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI), pages 424–432, 2016

2016
[7]

Routledge, 2013

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Routledge, 2013

2013
[8]

Morphable face models – an open framework

Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. Morphable face models – an open framework. In2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 75–82, 2018

2018
[9]

StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis.arXiv preprint arXiv:2110.08985, 2021

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis.arXiv preprint arXiv:2110.08985, 2021. 32

arXiv 2021
[10]

HuTuMotion: Human-tuned navigation of latent motion diffusion models with minimal feedback

Gaoge Han, Shaoli Huang, Mingming Gong, and Jinglei Tang. HuTuMotion: Human-tuned navigation of latent motion diffusion models with minimal feedback. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2031–2039, 2024

2031
[11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

2016
[12]

AlecHelbling,ChristopherJ.Rozell,MatthewO’Shaughnessy,andKionFallah.PrefGen: Preference guided image generation with relative attributes.arXiv preprint arXiv:2304.00185, 2023

arXiv 2023
[13]

Squeeze-and-excitationnetworks

JieHu,LiShen,andGangSun. Squeeze-and-excitationnetworks. InIEEEConferenceonComputer Vision and Pattern Recognition, pages 7132–7141, 2018

2018
[14]

DreamControl: Control-based text-to-3D generation with 3D self-prior

Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, WH Lau Ryson, and Wangmeng Zuo. DreamControl: Control-based text-to-3D generation with 3D self-prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5364–5373, 2024

2024
[15]

HumanNorm: Learning normal diffusion model for high-quality and realistic 3D human generation

Xin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu, and Qing Wang. HumanNorm: Learning normal diffusion model for high-quality and realistic 3D human generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4568–4577, 2024

2024
[16]

Pixel-in-pixel net: Towards efficient facial landmark detection in the wild.International Journal of Computer Vision, 129(12):3174–3194, 2021

Haibo Jin, Shengcai Liao, and Ling Shao. Pixel-in-pixel net: Towards efficient facial landmark detection in the wild.International Journal of Computer Vision, 129(12):3174–3194, 2021

2021
[17]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. Source of the FFHQ dataset used to train EG3D

2019
[18]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. Source of FFHQ dataset

2019
[19]

Alias-free generative adversarial networks

TeroKarras,MiikaAittala,SamuliLaine,ErikHärkönen,JanneHellsten,JaakkoLehtinen,andTimo Aila. Alias-free generative adversarial networks. InAdvances in Neural Information Processing Systems, 2021

2021
[20]

Preference-based image generation

Hadi Kazemi, Fariborz Taherkhani, and Nasser Nasrabadi. Preference-based image generation. In IEEE/CVF Winter Conference on Applications of Computer Vision, 2020

2020
[21]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

Pith/arXiv arXiv 2014
[22]

Understanding the effects of RLHF on LLM generalisation and diversity

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, and Roberta Raileanu. Understanding the effects of RLHF on LLM generalisation and diversity. InInternational Conference on Learning Representations, 2024

2024
[23]

Pick-a-Pic: An open dataset of user preferences for text-to-image generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-Pic: An open dataset of user preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, 2023

2023
[24]

Fast radiance field reconstruction from sparse inputs.Pattern Recognition, 157, 2025

Song Lai, Linyan Cui, and Jihao Yin. Fast radiance field reconstruction from sparse inputs.Pattern Recognition, 157, 2025. doi: 10.1016/j.patcog.2024.110863

work page doi:10.1016/j.patcog.2024.110863 2025
[25]

LN3Diff: Scalable latent neural fields diffusion for speedy 3D generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. LN3Diff: Scalable latent neural fields diffusion for speedy 3D generation. In European Conference on Computer Vision, 2024. 33

2024
[26]

SphereHead: Stable 3D full-head synthesis with spherical tri-plane representation

Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, and Xiaoguang Han. SphereHead: Stable 3D full-head synthesis with spherical tri-plane representation. InEuropean Conference on Computer Vision, pages 324–341. Springer, 2024

2024
[27]

HyPlaneHead: Rethinking tri-plane-like representations in full-head image synthesis

Heyuan Li, Kenkun Liu, Lingteng Qiu, Qi Zuo, Keru Zheng, Zilong Dong, and Xiaoguang Han. HyPlaneHead: Rethinking tri-plane-like representations in full-head image synthesis. InAdvances in Neural Information Processing Systems, 2025. arXiv:2509.16748

arXiv 2025
[28]

CraftsMan3D:High-fidelitymeshgenerationwith3Dnativediffusionand interactivegeometryrefiner

Weiyu Li, Jiarui Liu, Hongyu Hu, Rui Chen, Yixun Liu, Cheng Tan, Xuan Lin, Jingwei Tang, Junjie Zhao,XiaoxiaoLiu,etal. CraftsMan3D:High-fidelitymeshgenerationwith3Dnativediffusionand interactivegeometryrefiner. InIEEE/CVFConferenceonComputerVisionandPatternRecognition, 2025

2025
[29]

Integrating reinforcement learning with visual generative models: Foundations and advances.Vicinagearth, 3(1):2, 2026

Yuanzhi Liang, Yijie Fang, Rui Li, Ziqi Ni, Ruijie Su, and Chi Zhang. Integrating reinforcement learning with visual generative models: Foundations and advances.Vicinagearth, 3(1):2, 2026. doi: 10.1007/s44336-025-00030-z. arXiv:2508.10316

work page doi:10.1007/s44336-025-00030-z 2026
[30]

GUS-IR: Gaussian Splatting With Unified Shading for Inverse Rendering .IEEE Transactions on Pattern Analysis & Machine Intelligence, 47(10):8364–8378, October 2025

Fangfu Liu, Junliang Ye, Yikai Wang, Hanyang Wang, Zhengyi Wang, Jun Zhu, and Yueqi Duan. DreamReward-X: Boosting high-quality 3D generation with human preference alignment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. doi: 10.1109/TPAMI.2025. 3609680

work page doi:10.1109/tpami.2025 2025
[31]

DreamAlign: Dynamic text-to-3D optimization with human preference alignment

Gaofeng Liu, Zhiyuan Ma, and Tao Fang. DreamAlign: Dynamic text-to-3D optimization with human preference alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5424–5432, 2025

2025
[32]

Mesh-RFT:Enhancingmeshgeneration via fine-grained reinforcement fine-tuning

Jian Liu, Jing Xu, Song Guo, Jing Li, Jingfeng Guo, Jiaao Yu, Haohan Weng, Biwen Lei, Xianghui Yang,ZhuoChen,FangqiZhu,TaoHan,andChunchaoGuo. Mesh-RFT:Enhancingmeshgeneration via fine-grained reinforcement fine-tuning. InAdvances in Neural Information Processing Systems (Spotlight), 2025

2025
[33]

Nabla-R2D3: Effective and efficient 3D diffusion alignment with 2D rewards.arXiv preprint arXiv:2506.15684, 2025

Qingming Liu, Zhen Liu, Dinghuai Zhang, and Kui Jia. Nabla-R2D3: Effective and efficient 3D diffusion alignment with 2D rewards.arXiv preprint arXiv:2506.15684, 2025

arXiv 2025
[34]

Point cloud quality assessment: Dataset con- struction and learning-based no-reference metric.ACM Transactions on Multimedia Computing, Communications and Applications, 2022

Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. Point cloud quality assessment: Dataset con- struction and learning-based no-reference metric.ACM Transactions on Multimedia Computing, Communications and Applications, 2022

2022
[35]

Lorensen and Harvey E

William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3D surface construction algorithm.ACM SIGGRAPH Computer Graphics, 21(4):163–169, 1987

1987
[36]

Aunifiedapproachtointerpretingmodelpredictions

ScottM.LundbergandSu-InLee. Aunifiedapproachtointerpretingmodelpredictions. InAdvances in Neural Information Processing Systems, 2017

2017
[37]

Which training methods for GANs do actually converge? InInternational Conference on Machine Learning, pages 3481–3490, 2018

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for GANs do actually converge? InInternational Conference on Machine Learning, pages 3481–3490, 2018

2018
[38]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision, 2020

2020
[39]

High-fidelity 3D reconstruction via unified NeRF-mesh optimization with geometric and color consistency.Pattern Recognition, 170:112071,

Rama Bastola Neupane, Kan Li, and Zhuqing Mao. High-fidelity 3D reconstruction via unified NeRF-mesh optimization with geometric and color consistency.Pattern Recognition, 170:112071,
[40]

doi: 10.1016/j.patcog.2025.112071. 34

work page doi:10.1016/j.patcog.2025.112071 2025
[41]

Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.arXiv preprint arXiv:2203.02155, 2022

Pith/arXiv arXiv 2022
[42]

Parkhi, Andrea Vedaldi, and Andrew Zisserman

Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. InBritish Machine Vision Conference (BMVC), 2015

2015
[43]

Qi, Hao Su, Kaichun Mo, and Leonidas J

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InIEEE Conference on Computer Vision and Pattern Recognition, 2017

2017
[44]

Qi, Li Yi, Hao Su, and Leonidas J

Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems, 2017

2017
[45]

Pivotaltuningforlatent-based editing of real images.ACM Transactions on Graphics, 42(1):1–13, 2022

DanielRoich,RonMokady,AmitH.Bermano,andDanielCohen-Or. Pivotaltuningforlatent-based editing of real images.ACM Transactions on Graphics, 42(1):1–13, 2022

2022
[46]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017
[47]

GRAF: Generative radiance fields for 3D-aware image synthesis

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. GRAF: Generative radiance fields for 3D-aware image synthesis. InAdvances in Neural Information Processing Systems, 2020

2020
[48]

Lloyd S. Shapley. A value for n-person games.Contributions to the Theory of Games, 2, 1953

1953
[49]

Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. InAdvances in Neural Information Processing Systems, 2021

2021
[50]

Deep generative models on 3D representations: A survey.arXiv preprint arXiv:2210.15663, 2022

Zifan Shi, Sida Peng, Yinghao Xu, Yiyi Liao, and Yujun Shen. Deep generative models on 3D representations: A survey.arXiv preprint arXiv:2210.15663, 2022

arXiv 2022
[51]

Improving 3D-aware image synthesis with a geometry-aware discriminator

Zifan Shi, Yinghao Shen, Yujun Xu, Yiyi Liao, Deli Yueqian, Qifeng Zhao, and Dit-Yan Yeung. Improving 3D-aware image synthesis with a geometry-aware discriminator. InAdvances in Neural Information Processing Systems, 2022

2022
[52]

Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

Pith/arXiv arXiv 2014
[53]

EpiGRAF: Rethinking training of 3D GANs

Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, and Peter Wonka. EpiGRAF: Rethinking training of 3D GANs. InAdvances in Neural Information Processing Systems, 2022

2022
[54]

Learning to summarize from human feedback.arXiv preprint arXiv:2009.01325, 2020

NisanStiennon, LongOuyang, JeffWu, DanielM.Ziegler, RyanLowe, ChelseaVoss, AlecRadford, Dario Amodei, and Paul Christiano. Learning to summarize from human feedback.arXiv preprint arXiv:2009.01325, 2020

Pith/arXiv arXiv 2009
[55]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, pages 3319–3328, 2017

2017
[56]

Zeroth-order optimization meets human feedback: Provable learning via ranking oracles.arXiv preprint arXiv:2303.03751, 2023

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang. Zeroth-order optimization meets human feedback: Provable learning via ranking oracles.arXiv preprint arXiv:2303.03751, 2023

arXiv 2023
[57]

Deep learning semantic segmentation for high-resolution medical volumes

Imad Eddine Toubal, Ye Duan, and Deshan Yang. Deep learning semantic segmentation for high-resolution medical volumes. In2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pages 1–9. IEEE, 2020. 35

2020
[58]

MVReward: Better aligning and evaluating multi-view diffusion models with human preferences

Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, and Haoqian Wang. MVReward: Better aligning and evaluating multi-view diffusion models with human preferences. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 7898–7906, 2025

2025
[59]

Single image, any face: Generalisable 3D face generation.Pattern Recognition, 178, 2026

Wenqing Wang, Haosen Yang, Josef Kittler, and Xiatian Zhu. Single image, any face: Generalisable 3D face generation.Pattern Recognition, 178, 2026. doi: 10.1016/j.patcog.2026.113375. In press; preprint: arXiv:2409.16990

work page doi:10.1016/j.patcog.2026.113375 2026
[60]

LLaMA-Mesh: Unifying 3D mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng. LLaMA-Mesh: Unifying 3D mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

arXiv 2024
[61]

Accurate and versatile 3D segmentation of plant tissues at cellular resolution.Elife, 9:e57613, 2020

Adrian Wolny, Lorenzo Cerrone, Athul Vijayan, Rachele Tofanelli, Amaya Vilches Barro, Marion Louveaux, Christian Wenzl, Sören Strauss, David Wilson-Sánchez, Rena Lymbouridou, et al. Accurate and versatile 3D segmentation of plant tissues at cellular resolution.Elife, 9:e57613, 2020

2020
[62]

GPT-4V(ision) is a human-aligned evaluator for text-to-3D generation

Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, and Gordon Wetzstein. GPT-4V(ision) is a human-aligned evaluator for text-to-3D generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22227–22238, 2024

2024
[63]

Look at boundary: A boundary-aware face alignment algorithm

Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. Look at boundary: A boundary-aware face alignment algorithm. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2129–2138, 2018

2018
[64]

Reinforcement learning for large model: A survey.arXiv preprint arXiv:2508.08189, 2025

Weijia Wu, Chen Gao, Joya Chen, Kevin Qinghong Lin, Qingwei Meng, Yiming Zhang, Yuke Qiu, Hong Zhou, and Mike Zheng Shou. Reinforcement learning for large model: A survey.arXiv preprint arXiv:2508.08189, 2025

arXiv 2025
[65]

Structured 3D latents for scalable and versatile 3D generation.arXiv preprint arXiv:2412.01506, 2024

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3D latents for scalable and versatile 3D generation.arXiv preprint arXiv:2412.01506, 2024

Pith/arXiv arXiv 2024
[66]

Nativeandcompactstructuredlatents for 3D generation.arXiv preprint arXiv:2512.14692, 2025

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, YueDong,HaoZhao,NicholasJingYuan,andJiaolongYang. Nativeandcompactstructuredlatents for 3D generation.arXiv preprint arXiv:2512.14692, 2025. Trellis 2

Pith/arXiv arXiv 2025
[67]

Walkinthecloud: Learning curves for point clouds shape analysis

TiangeXiang,ChaoyiZhang,YangSong,JianhuiYu,andWeidongCai. Walkinthecloud: Learning curves for point clouds shape analysis. InIEEE/CVF International Conference on Computer Vision, pages 915–924, 2021

2021
[68]

ImageReward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and evaluating human preferences for text-to-image generation. In Advances in Neural Information Processing Systems, 2023

2023
[69]

FacialTalk: Audio-driven high-fidelity facial portrait generation using 3D facial prior.Pattern Recognition, 171:111994, 2026

Daowu Yang, Ying Liu, Qiyun Yang, and Ruihui Li. FacialTalk: Audio-driven high-fidelity facial portrait generation using 3D facial prior.Pattern Recognition, 171:111994, 2026. ISSN 0031-3203. doi: 10.1016/j.patcog.2025.111994

work page doi:10.1016/j.patcog.2025.111994 2026
[70]

Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging

Chongjie Ye, Yushuang Wu, Ziteng Lu, Jiahao Chang, Xiaoyang Guo, Jiaqing Zhou, Hao Zhao, and Xiaoguang Han. Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging. arXiv preprint arXiv:2503.22236, 2025

arXiv 2025
[71]

DreamReward: Text-to-3D generation with human preference

Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, and Jun Zhu. DreamReward: Text-to-3D generation with human preference. InEuropean Conference on Computer Vision, 2024. 36

2024
[72]

GaussianCube: Astructuredandexplicitradiancerepresentationfor3Dgenerative modeling

Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, andBainingGuo. GaussianCube: Astructuredandexplicitradiancerepresentationfor3Dgenerative modeling. InAdvances in Neural Information Processing Systems, 2024

2024
[73]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

2018
[74]

Benchmarking and learning multi- dimensional quality evaluator for text-to-3D generation

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, and Yiling Xu. Benchmarking and learning multi- dimensional quality evaluator for text-to-3D generation. InIEEE/CVF International Conference on Computer Vision, pages 18563–18574, October 2025

2025
[75]

DeepMesh: Auto-regressive artist-mesh creation with reinforcement learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. DeepMesh: Auto-regressive artist-mesh creation with reinforcement learning. InIEEE/CVF International Conference on Computer Vision, 2025

2025
[76]

DreamDPO:Aligning text-to-3D generation with human preferences via direct preference optimization

ZhenglinZhou,XiaoboXia,FanMa,HeheFan,YiYang,andTat-SengChua. DreamDPO:Aligning text-to-3D generation with human preferences via direct preference optimization. InInternational Conference on Machine Learning, 2025

2025
[77]

DreamCS:Geometry-awaretext-to-3D generation with unpaired 3D reward supervision.arXiv preprint arXiv:2506.09814, 2025

XiandongZou, RuihaoXia,HongsongWang, andPanZhou. DreamCS:Geometry-awaretext-to-3D generation with unpaired 3D reward supervision.arXiv preprint arXiv:2506.09814, 2025. 37

arXiv 2025

[1] [1]

Ogras, and Linjie Luo

Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, and Linjie Luo. PanoHead: Geometry-aware 3D full-head synthesis in 360 degrees. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2023

[2] [2]

Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein

Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis.arXiv preprint arXiv:2012.00926, 2020

arXiv 2012

[3] [3]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3D generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022

[4] [4]

Beyond prompts: Unconditional 3D inversion for out-of-distribution shapes.arXiv preprint arXiv:2604.14914, 2026

Victoria Yue Chen, Emery Pierson, Léopold Maillard, and Maks Ovsjanikov. Beyond prompts: Unconditional 3D inversion for out-of-distribution shapes.arXiv preprint arXiv:2604.14914, 2026

Pith/arXiv arXiv 2026

[5] [5]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, 2017

2017

[6] [6]

Lienkamp, Thomas Brox, and Olaf Ronneberger

Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. InMedical Image Computing and Computer-Assisted Intervention (MICCAI), pages 424–432, 2016

2016

[7] [7]

Routledge, 2013

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Routledge, 2013

2013

[8] [8]

Morphable face models – an open framework

Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. Morphable face models – an open framework. In2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 75–82, 2018

2018

[9] [9]

StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis.arXiv preprint arXiv:2110.08985, 2021

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis.arXiv preprint arXiv:2110.08985, 2021. 32

arXiv 2021

[10] [10]

HuTuMotion: Human-tuned navigation of latent motion diffusion models with minimal feedback

Gaoge Han, Shaoli Huang, Mingming Gong, and Jinglei Tang. HuTuMotion: Human-tuned navigation of latent motion diffusion models with minimal feedback. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2031–2039, 2024

2031

[11] [11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

2016

[12] [12]

AlecHelbling,ChristopherJ.Rozell,MatthewO’Shaughnessy,andKionFallah.PrefGen: Preference guided image generation with relative attributes.arXiv preprint arXiv:2304.00185, 2023

arXiv 2023

[13] [13]

Squeeze-and-excitationnetworks

JieHu,LiShen,andGangSun. Squeeze-and-excitationnetworks. InIEEEConferenceonComputer Vision and Pattern Recognition, pages 7132–7141, 2018

2018

[14] [14]

DreamControl: Control-based text-to-3D generation with 3D self-prior

Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, WH Lau Ryson, and Wangmeng Zuo. DreamControl: Control-based text-to-3D generation with 3D self-prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5364–5373, 2024

2024

[15] [15]

HumanNorm: Learning normal diffusion model for high-quality and realistic 3D human generation

Xin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu, and Qing Wang. HumanNorm: Learning normal diffusion model for high-quality and realistic 3D human generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4568–4577, 2024

2024

[16] [16]

Pixel-in-pixel net: Towards efficient facial landmark detection in the wild.International Journal of Computer Vision, 129(12):3174–3194, 2021

Haibo Jin, Shengcai Liao, and Ling Shao. Pixel-in-pixel net: Towards efficient facial landmark detection in the wild.International Journal of Computer Vision, 129(12):3174–3194, 2021

2021

[17] [17]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. Source of the FFHQ dataset used to train EG3D

2019

[18] [18]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. Source of FFHQ dataset

2019

[19] [19]

Alias-free generative adversarial networks

TeroKarras,MiikaAittala,SamuliLaine,ErikHärkönen,JanneHellsten,JaakkoLehtinen,andTimo Aila. Alias-free generative adversarial networks. InAdvances in Neural Information Processing Systems, 2021

2021

[20] [20]

Preference-based image generation

Hadi Kazemi, Fariborz Taherkhani, and Nasser Nasrabadi. Preference-based image generation. In IEEE/CVF Winter Conference on Applications of Computer Vision, 2020

2020

[21] [21]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

Pith/arXiv arXiv 2014

[22] [22]

Understanding the effects of RLHF on LLM generalisation and diversity

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, and Roberta Raileanu. Understanding the effects of RLHF on LLM generalisation and diversity. InInternational Conference on Learning Representations, 2024

2024

[23] [23]

Pick-a-Pic: An open dataset of user preferences for text-to-image generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-Pic: An open dataset of user preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, 2023

2023

[24] [24]

Fast radiance field reconstruction from sparse inputs.Pattern Recognition, 157, 2025

Song Lai, Linyan Cui, and Jihao Yin. Fast radiance field reconstruction from sparse inputs.Pattern Recognition, 157, 2025. doi: 10.1016/j.patcog.2024.110863

work page doi:10.1016/j.patcog.2024.110863 2025

[25] [25]

LN3Diff: Scalable latent neural fields diffusion for speedy 3D generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. LN3Diff: Scalable latent neural fields diffusion for speedy 3D generation. In European Conference on Computer Vision, 2024. 33

2024

[26] [26]

SphereHead: Stable 3D full-head synthesis with spherical tri-plane representation

Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, and Xiaoguang Han. SphereHead: Stable 3D full-head synthesis with spherical tri-plane representation. InEuropean Conference on Computer Vision, pages 324–341. Springer, 2024

2024

[27] [27]

HyPlaneHead: Rethinking tri-plane-like representations in full-head image synthesis

Heyuan Li, Kenkun Liu, Lingteng Qiu, Qi Zuo, Keru Zheng, Zilong Dong, and Xiaoguang Han. HyPlaneHead: Rethinking tri-plane-like representations in full-head image synthesis. InAdvances in Neural Information Processing Systems, 2025. arXiv:2509.16748

arXiv 2025

[28] [28]

CraftsMan3D:High-fidelitymeshgenerationwith3Dnativediffusionand interactivegeometryrefiner

Weiyu Li, Jiarui Liu, Hongyu Hu, Rui Chen, Yixun Liu, Cheng Tan, Xuan Lin, Jingwei Tang, Junjie Zhao,XiaoxiaoLiu,etal. CraftsMan3D:High-fidelitymeshgenerationwith3Dnativediffusionand interactivegeometryrefiner. InIEEE/CVFConferenceonComputerVisionandPatternRecognition, 2025

2025

[29] [29]

Integrating reinforcement learning with visual generative models: Foundations and advances.Vicinagearth, 3(1):2, 2026

Yuanzhi Liang, Yijie Fang, Rui Li, Ziqi Ni, Ruijie Su, and Chi Zhang. Integrating reinforcement learning with visual generative models: Foundations and advances.Vicinagearth, 3(1):2, 2026. doi: 10.1007/s44336-025-00030-z. arXiv:2508.10316

work page doi:10.1007/s44336-025-00030-z 2026

[30] [30]

GUS-IR: Gaussian Splatting With Unified Shading for Inverse Rendering .IEEE Transactions on Pattern Analysis & Machine Intelligence, 47(10):8364–8378, October 2025

Fangfu Liu, Junliang Ye, Yikai Wang, Hanyang Wang, Zhengyi Wang, Jun Zhu, and Yueqi Duan. DreamReward-X: Boosting high-quality 3D generation with human preference alignment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. doi: 10.1109/TPAMI.2025. 3609680

work page doi:10.1109/tpami.2025 2025

[31] [31]

DreamAlign: Dynamic text-to-3D optimization with human preference alignment

Gaofeng Liu, Zhiyuan Ma, and Tao Fang. DreamAlign: Dynamic text-to-3D optimization with human preference alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5424–5432, 2025

2025

[32] [32]

Mesh-RFT:Enhancingmeshgeneration via fine-grained reinforcement fine-tuning

Jian Liu, Jing Xu, Song Guo, Jing Li, Jingfeng Guo, Jiaao Yu, Haohan Weng, Biwen Lei, Xianghui Yang,ZhuoChen,FangqiZhu,TaoHan,andChunchaoGuo. Mesh-RFT:Enhancingmeshgeneration via fine-grained reinforcement fine-tuning. InAdvances in Neural Information Processing Systems (Spotlight), 2025

2025

[33] [33]

Nabla-R2D3: Effective and efficient 3D diffusion alignment with 2D rewards.arXiv preprint arXiv:2506.15684, 2025

Qingming Liu, Zhen Liu, Dinghuai Zhang, and Kui Jia. Nabla-R2D3: Effective and efficient 3D diffusion alignment with 2D rewards.arXiv preprint arXiv:2506.15684, 2025

arXiv 2025

[34] [34]

Point cloud quality assessment: Dataset con- struction and learning-based no-reference metric.ACM Transactions on Multimedia Computing, Communications and Applications, 2022

Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. Point cloud quality assessment: Dataset con- struction and learning-based no-reference metric.ACM Transactions on Multimedia Computing, Communications and Applications, 2022

2022

[35] [35]

Lorensen and Harvey E

William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3D surface construction algorithm.ACM SIGGRAPH Computer Graphics, 21(4):163–169, 1987

1987

[36] [36]

Aunifiedapproachtointerpretingmodelpredictions

ScottM.LundbergandSu-InLee. Aunifiedapproachtointerpretingmodelpredictions. InAdvances in Neural Information Processing Systems, 2017

2017

[37] [37]

Which training methods for GANs do actually converge? InInternational Conference on Machine Learning, pages 3481–3490, 2018

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for GANs do actually converge? InInternational Conference on Machine Learning, pages 3481–3490, 2018

2018

[38] [38]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision, 2020

2020

[39] [39]

High-fidelity 3D reconstruction via unified NeRF-mesh optimization with geometric and color consistency.Pattern Recognition, 170:112071,

Rama Bastola Neupane, Kan Li, and Zhuqing Mao. High-fidelity 3D reconstruction via unified NeRF-mesh optimization with geometric and color consistency.Pattern Recognition, 170:112071,

[40] [40]

doi: 10.1016/j.patcog.2025.112071. 34

work page doi:10.1016/j.patcog.2025.112071 2025

[41] [41]

Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.arXiv preprint arXiv:2203.02155, 2022

Pith/arXiv arXiv 2022

[42] [42]

Parkhi, Andrea Vedaldi, and Andrew Zisserman

Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. InBritish Machine Vision Conference (BMVC), 2015

2015

[43] [43]

Qi, Hao Su, Kaichun Mo, and Leonidas J

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InIEEE Conference on Computer Vision and Pattern Recognition, 2017

2017

[44] [44]

Qi, Li Yi, Hao Su, and Leonidas J

Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems, 2017

2017

[45] [45]

Pivotaltuningforlatent-based editing of real images.ACM Transactions on Graphics, 42(1):1–13, 2022

DanielRoich,RonMokady,AmitH.Bermano,andDanielCohen-Or. Pivotaltuningforlatent-based editing of real images.ACM Transactions on Graphics, 42(1):1–13, 2022

2022

[46] [46]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017

[47] [47]

GRAF: Generative radiance fields for 3D-aware image synthesis

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. GRAF: Generative radiance fields for 3D-aware image synthesis. InAdvances in Neural Information Processing Systems, 2020

2020

[48] [48]

Lloyd S. Shapley. A value for n-person games.Contributions to the Theory of Games, 2, 1953

1953

[49] [49]

Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. InAdvances in Neural Information Processing Systems, 2021

2021

[50] [50]

Deep generative models on 3D representations: A survey.arXiv preprint arXiv:2210.15663, 2022

Zifan Shi, Sida Peng, Yinghao Xu, Yiyi Liao, and Yujun Shen. Deep generative models on 3D representations: A survey.arXiv preprint arXiv:2210.15663, 2022

arXiv 2022

[51] [51]

Improving 3D-aware image synthesis with a geometry-aware discriminator

Zifan Shi, Yinghao Shen, Yujun Xu, Yiyi Liao, Deli Yueqian, Qifeng Zhao, and Dit-Yan Yeung. Improving 3D-aware image synthesis with a geometry-aware discriminator. InAdvances in Neural Information Processing Systems, 2022

2022

[52] [52]

Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

Pith/arXiv arXiv 2014

[53] [53]

EpiGRAF: Rethinking training of 3D GANs

Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, and Peter Wonka. EpiGRAF: Rethinking training of 3D GANs. InAdvances in Neural Information Processing Systems, 2022

2022

[54] [54]

Learning to summarize from human feedback.arXiv preprint arXiv:2009.01325, 2020

NisanStiennon, LongOuyang, JeffWu, DanielM.Ziegler, RyanLowe, ChelseaVoss, AlecRadford, Dario Amodei, and Paul Christiano. Learning to summarize from human feedback.arXiv preprint arXiv:2009.01325, 2020

Pith/arXiv arXiv 2009

[55] [55]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, pages 3319–3328, 2017

2017

[56] [56]

Zeroth-order optimization meets human feedback: Provable learning via ranking oracles.arXiv preprint arXiv:2303.03751, 2023

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang. Zeroth-order optimization meets human feedback: Provable learning via ranking oracles.arXiv preprint arXiv:2303.03751, 2023

arXiv 2023

[57] [57]

Deep learning semantic segmentation for high-resolution medical volumes

Imad Eddine Toubal, Ye Duan, and Deshan Yang. Deep learning semantic segmentation for high-resolution medical volumes. In2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pages 1–9. IEEE, 2020. 35

2020

[58] [58]

MVReward: Better aligning and evaluating multi-view diffusion models with human preferences

Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, and Haoqian Wang. MVReward: Better aligning and evaluating multi-view diffusion models with human preferences. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 7898–7906, 2025

2025

[59] [59]

Single image, any face: Generalisable 3D face generation.Pattern Recognition, 178, 2026

Wenqing Wang, Haosen Yang, Josef Kittler, and Xiatian Zhu. Single image, any face: Generalisable 3D face generation.Pattern Recognition, 178, 2026. doi: 10.1016/j.patcog.2026.113375. In press; preprint: arXiv:2409.16990

work page doi:10.1016/j.patcog.2026.113375 2026

[60] [60]

LLaMA-Mesh: Unifying 3D mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng. LLaMA-Mesh: Unifying 3D mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

arXiv 2024

[61] [61]

Accurate and versatile 3D segmentation of plant tissues at cellular resolution.Elife, 9:e57613, 2020

Adrian Wolny, Lorenzo Cerrone, Athul Vijayan, Rachele Tofanelli, Amaya Vilches Barro, Marion Louveaux, Christian Wenzl, Sören Strauss, David Wilson-Sánchez, Rena Lymbouridou, et al. Accurate and versatile 3D segmentation of plant tissues at cellular resolution.Elife, 9:e57613, 2020

2020

[62] [62]

GPT-4V(ision) is a human-aligned evaluator for text-to-3D generation

Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, and Gordon Wetzstein. GPT-4V(ision) is a human-aligned evaluator for text-to-3D generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22227–22238, 2024

2024

[63] [63]

Look at boundary: A boundary-aware face alignment algorithm

Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. Look at boundary: A boundary-aware face alignment algorithm. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2129–2138, 2018

2018

[64] [64]

Reinforcement learning for large model: A survey.arXiv preprint arXiv:2508.08189, 2025

Weijia Wu, Chen Gao, Joya Chen, Kevin Qinghong Lin, Qingwei Meng, Yiming Zhang, Yuke Qiu, Hong Zhou, and Mike Zheng Shou. Reinforcement learning for large model: A survey.arXiv preprint arXiv:2508.08189, 2025

arXiv 2025

[65] [65]

Structured 3D latents for scalable and versatile 3D generation.arXiv preprint arXiv:2412.01506, 2024

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3D latents for scalable and versatile 3D generation.arXiv preprint arXiv:2412.01506, 2024

Pith/arXiv arXiv 2024

[66] [66]

Nativeandcompactstructuredlatents for 3D generation.arXiv preprint arXiv:2512.14692, 2025

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, YueDong,HaoZhao,NicholasJingYuan,andJiaolongYang. Nativeandcompactstructuredlatents for 3D generation.arXiv preprint arXiv:2512.14692, 2025. Trellis 2

Pith/arXiv arXiv 2025

[67] [67]

Walkinthecloud: Learning curves for point clouds shape analysis

TiangeXiang,ChaoyiZhang,YangSong,JianhuiYu,andWeidongCai. Walkinthecloud: Learning curves for point clouds shape analysis. InIEEE/CVF International Conference on Computer Vision, pages 915–924, 2021

2021

[68] [68]

ImageReward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and evaluating human preferences for text-to-image generation. In Advances in Neural Information Processing Systems, 2023

2023

[69] [69]

FacialTalk: Audio-driven high-fidelity facial portrait generation using 3D facial prior.Pattern Recognition, 171:111994, 2026

Daowu Yang, Ying Liu, Qiyun Yang, and Ruihui Li. FacialTalk: Audio-driven high-fidelity facial portrait generation using 3D facial prior.Pattern Recognition, 171:111994, 2026. ISSN 0031-3203. doi: 10.1016/j.patcog.2025.111994

work page doi:10.1016/j.patcog.2025.111994 2026

[70] [70]

Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging

Chongjie Ye, Yushuang Wu, Ziteng Lu, Jiahao Chang, Xiaoyang Guo, Jiaqing Zhou, Hao Zhao, and Xiaoguang Han. Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging. arXiv preprint arXiv:2503.22236, 2025

arXiv 2025

[71] [71]

DreamReward: Text-to-3D generation with human preference

Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, and Jun Zhu. DreamReward: Text-to-3D generation with human preference. InEuropean Conference on Computer Vision, 2024. 36

2024

[72] [72]

GaussianCube: Astructuredandexplicitradiancerepresentationfor3Dgenerative modeling

Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, andBainingGuo. GaussianCube: Astructuredandexplicitradiancerepresentationfor3Dgenerative modeling. InAdvances in Neural Information Processing Systems, 2024

2024

[73] [73]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

2018

[74] [74]

Benchmarking and learning multi- dimensional quality evaluator for text-to-3D generation

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, and Yiling Xu. Benchmarking and learning multi- dimensional quality evaluator for text-to-3D generation. InIEEE/CVF International Conference on Computer Vision, pages 18563–18574, October 2025

2025

[75] [75]

DeepMesh: Auto-regressive artist-mesh creation with reinforcement learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. DeepMesh: Auto-regressive artist-mesh creation with reinforcement learning. InIEEE/CVF International Conference on Computer Vision, 2025

2025

[76] [76]

DreamDPO:Aligning text-to-3D generation with human preferences via direct preference optimization

ZhenglinZhou,XiaoboXia,FanMa,HeheFan,YiYang,andTat-SengChua. DreamDPO:Aligning text-to-3D generation with human preferences via direct preference optimization. InInternational Conference on Machine Learning, 2025

2025

[77] [77]

DreamCS:Geometry-awaretext-to-3D generation with unpaired 3D reward supervision.arXiv preprint arXiv:2506.09814, 2025

XiandongZou, RuihaoXia,HongsongWang, andPanZhou. DreamCS:Geometry-awaretext-to-3D generation with unpaired 3D reward supervision.arXiv preprint arXiv:2506.09814, 2025. 37

arXiv 2025