pith. machine review for the scientific record. sign in

arxiv: 2605.05155 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI

Recognition: unknown

Aes3D: Aesthetic Assessment in 3D Gaussian Splatting

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:49 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords 3D Gaussian SplattingAesthetic Assessment3D Scene EvaluationNeural RenderingAesthetic DatasetLightweight ModelImmersive Media
0
0 comments X

The pith

A lightweight model predicts aesthetic scores for 3D scenes directly from Gaussian splat primitives without rendering images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the gap in evaluating 3D neural rendering scenes for qualities like composition, harmony, and visual appeal rather than just reconstruction accuracy. It introduces the Aes3D framework, which includes a new dataset called Aesthetic3D annotated specifically for 3D scene aesthetics and a model called Aes3DGSNet that learns to regress scene-level scores from multi-view 3D Gaussian representations. The approach uses aesthetics-supervised learning to capture high-level cues from low-level primitives, and experiments show it delivers strong performance in a compact form. If correct, this would let creators assess and refine 3D content for appeal during development while cutting the need for full image rendering pipelines. It also sets an initial benchmark for systematic 3D aesthetic assessment in immersive media.

Core claim

We propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-l

What carries the argument

Aes3DGSNet, a lightweight network that takes 3D Gaussian primitives as input and regresses scene-level aesthetic scores via aesthetics-supervised learning on multi-view representations.

If this is right

  • Creators of 3D content can obtain aesthetic feedback without rendering full multi-view images, lowering compute and hardware demands.
  • The Aesthetic3D dataset provides a public resource for training and evaluating future 3D aesthetic assessment methods.
  • Aes3DGSNet establishes a new performance benchmark for lightweight, direct-from-primitives aesthetic scoring in 3DGS scenes.
  • The method supports iterative refinement of visually compelling 3D scenes in immersive media and digital content pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same primitive-to-score mapping could be adapted to other explicit 3D representations such as point clouds or meshes if the input encoding is adjusted.
  • Automated aesthetic optimization loops could be built on top of the scorer to adjust Gaussian parameters toward higher predicted appeal.
  • Extending the annotation strategy to dynamic or animated 3DGS scenes would test whether the learned cues generalize beyond static views.

Load-bearing premise

High-level aesthetic attributes such as composition and harmony can be accurately regressed from the raw low-level attributes of 3D Gaussian primitives alone.

What would settle it

Human raters on a held-out set of 3DGS scenes give scores that show no correlation with the model's predictions, or the model matches or underperforms a simple baseline that ignores the 3D structure.

Figures

Figures reproduced from arXiv: 2605.05155 by Boyu Wei, Chuanzhi Xu, Haodong Chen, Haoxian Zhou, Qiang Qu, Weidong Cai, Xuanhua Yin, Zihan Deng.

Figure 1
Figure 1. Figure 1: Aes3D includes a method for IAA-based aesthetic annotation of 3D Scene datasets, upon which the Aesthetic3D dataset is constructed. It also includes Aes3DGSNet, a model capable of evaluating the aesthetic scores of 3DGS scenes. Some scoring examples are shown below. Image Aesthetic Assessment (IAA) has long been an important problem in computer vision, aiming to quantify subjective quality in human visual … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of IAA-based Annotation for constructing Aesthetic3D. view at source ↗
Figure 3
Figure 3. Figure 3: Statistical overview of Aesthetic3D (8-attr mean). view at source ↗
Figure 4
Figure 4. Figure 4: Overview of Aes3DGSNet. distribution, and visual balance, are inherently view-dependent. A single global 3D summary may miss such a view-conditioned evidence, whereas uniformly averaging all candidate views may dilute informative observations with redundant or low-quality views. In practice, the input multiple views used during Aesthetic3D annotation may not coincide with those available at inference time,… view at source ↗
Figure 5
Figure 5. Figure 5: Score distributions of different IAA annotators on Aesthetic3D Dataset. ArtiMuse shows view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of scene-level total scores (left) and within-scene total-score gaps (right). The view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of ArtiMuse attribute scores across datasets. Box plots of the eight attribute-level scores for DL3DV-10K and Bilarf. Bilarf exhibits consistently higher score ranges across all attributes, indicating a clear distribution shift toward higher aesthetic quality. Due to the small sample size of Bilarf, the statistics mainly reveal the existence of dataset bias rather than a stable estimate of its… view at source ↗
Figure 8
Figure 8. Figure 8: Pairwise Pearson correlations among the eight aesthetic attributes. Left: DL3DV-10K shows consistently high correlations (0.86–0.99), indicating strong collinearity among attributes. Right: Bilarf exhibits similar trends, but originality shows weaker correlations with other attributes, suggesting partial independence under certain conditions. Cross-dataset Distribution Shift. We first examine the distribut… view at source ↗
Figure 9
Figure 9. Figure 9: Correlation between attribute-level scores and the holistic aesthetic score. On DL3DV￾10K, all attributes show very strong correlations with the total score (Pearson ∼ 0.94–0.98). On Bilarf, ranking consistency remains high (Spearman ≈ 1.0), while linear correlations vary more, especially for originality. This indicates strong ordinal consistency but dataset-dependent linear relationships. highlights the p… view at source ↗
Figure 10
Figure 10. Figure 10: Visualizations of Data Annotation Examples (1). view at source ↗
Figure 11
Figure 11. Figure 11: Visualizations of Data Annotation Examples (2). view at source ↗
Figure 12
Figure 12. Figure 12: Screenshots of the custom rating interface used in the human study. The interface provides view at source ↗
Figure 13
Figure 13. Figure 13: Alignment between Human Study ratings and proxy aesthetic scores. Left: comparison of the marginal distributions of Human Study ratings and the proxy Score. Right: paired agreement between integer Score grades and integer Human Study ratings, where bubble size indicates the number of participant-scene ratings at each pair. Most ratings concentrate near the diagonal in the mid-score range, indicating stron… view at source ↗
Figure 14
Figure 14. Figure 14: Some visualization examples of using Aes3DGSNet for aesthetic assessment of view at source ↗
read the original abstract

As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence of general 3DGS datasets with aesthetic annotations, and (2) the intrinsic nature of 3DGS as a low-level primitive representation, which makes it difficult to capture high-level aesthetic features. To address these challenges, we propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-level aesthetic cues and accurately regresses aesthetic scores. Experimental results demonstrate that our approach achieves strong performance while maintaining a lightweight design, establishing a new benchmark for 3D scene aesthetic assessment. Code and datasets will be made available in a future version.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Aes3D as the first systematic framework for assessing aesthetics of 3D neural rendering scenes represented via 3D Gaussian Splatting. It introduces the Aesthetic3D dataset (the first dedicated to 3D scene aesthetic assessment, built via a proposed annotation strategy) and Aes3DGSNet, a lightweight model that directly regresses scene-level aesthetic scores (for attributes such as composition, harmony, and visual appeal) from 3DGS primitives alone, without rendering multi-view images. The model is trained via aesthetics-supervised learning on multi-view 3DGS representations, and the abstract claims that experiments demonstrate strong performance while maintaining a lightweight design, establishing a new benchmark.

Significance. If the central claims hold, the work would be significant for the field of neural rendering and immersive media by shifting evaluation focus from reconstruction fidelity to higher-level aesthetic attributes. The creation of a dedicated dataset and an efficient model that avoids rendering costs could enable practical tools for content creators. The use of multi-view training signals to learn from unordered 3D primitives is a potentially useful direction, though its soundness depends on validation details not supplied in the available text.

major comments (2)
  1. [Abstract] Abstract (final paragraph): The load-bearing claim that Aes3DGSNet 'directly predicts scene-level aesthetic scores from 3DGS representations' and 'operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images' lacks any description of the architecture, aggregation mechanism over the unordered primitive set, or how view-dependent cues (occlusion, framing, lighting) are recovered. This is a correctness risk because aesthetic attributes are defined on 2D projections, and global statistics over primitives may not suffice even with multi-view training supervision.
  2. [Abstract] Abstract (experimental results sentence): No quantitative metrics, dataset statistics (scene count, annotation protocol, inter-annotator agreement), model size, baselines, train/test splits, or error bars are reported, making it impossible to evaluate the 'strong performance' or 'lightweight design' assertions or to determine whether the data supports the new-benchmark claim.
minor comments (1)
  1. [Abstract] Abstract: The promise that 'Code and datasets will be made available in a future version' is positive, but the manuscript should supply at least high-level dataset statistics and annotation guidelines to allow readers to assess the annotation strategy's reliability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the abstract. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final paragraph): The load-bearing claim that Aes3DGSNet 'directly predicts scene-level aesthetic scores from 3DGS representations' and 'operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images' lacks any description of the architecture, aggregation mechanism over the unordered primitive set, or how view-dependent cues (occlusion, framing, lighting) are recovered. This is a correctness risk because aesthetic attributes are defined on 2D projections, and global statistics over primitives may not suffice even with multi-view training supervision.

    Authors: We agree the abstract is highly condensed and omits these details. The full manuscript (Section 3) specifies that Aes3DGSNet uses a lightweight set-based aggregator (permutation-invariant operations over the 3D Gaussian attributes) trained with multi-view aesthetic supervision; view-dependent effects are learned implicitly through the supervision signal rather than explicit rendering at inference. We will revise the abstract to include a short clause describing the aggregation mechanism and the multi-view training strategy. revision: yes

  2. Referee: [Abstract] Abstract (experimental results sentence): No quantitative metrics, dataset statistics (scene count, annotation protocol, inter-annotator agreement), model size, baselines, train/test splits, or error bars are reported, making it impossible to evaluate the 'strong performance' or 'lightweight design' assertions or to determine whether the data supports the new-benchmark claim.

    Authors: The abstract follows conventional length constraints by summarizing results at a high level. All requested quantitative information (dataset size and annotation protocol, inter-annotator agreement, model parameter count, baseline comparisons, cross-validation splits, and error bars) appears in the Experiments section. We will revise the abstract to incorporate one or two key quantitative highlights (e.g., correlation with human ratings and parameter count) while respecting word limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new dataset and model are independent contributions

full rationale

The paper introduces a new dataset (Aesthetic3D) with a proposed annotation strategy and a new lightweight network (Aes3DGSNet) trained via supervised learning on multi-view 3DGS data to regress scene-level aesthetic scores directly from Gaussian primitives. No mathematical derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing steps are present in the provided text. The central claims rest on empirical construction of the dataset and architecture rather than any reduction of outputs to inputs by definition or prior self-referential results. The approach is self-contained against external benchmarks of dataset creation and model performance.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, background axioms, or newly postulated entities. The contributions consist of a new annotated dataset and a neural network model for aesthetic prediction.

pith-pipeline@v0.9.0 · 5617 in / 1423 out tokens · 90771 ms · 2026-05-08T16:49:07.194983+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 12 canonical work pages

  1. [1]

    A survey on image aesthetic assessment

    Abbas Anwar, Saira Kanwal, Muhammad Tahir, Muhammad Saqib, Muhammad Uzair, Mo- hammad Khalid Imam Rahmani, and Habib Ullah. A survey on image aesthetic assessment. arXiv preprint arXiv:2103.11616, 2021

  2. [2]

    Charm: the missing piece in vit fine-tuning for image aesthetic assessment

    Fatemeh Behrad, Tinne Tuytelaars, and Johan Wagemans. Charm: the missing piece in vit fine-tuning for image aesthetic assessment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7815–7824, 2025

  3. [3]

    Flexible frame selection for efficient video reasoning

    Shyamal Buch, Arsha Nagrani, Anurag Arnab, and Cordelia Schmid. Flexible frame selection for efficient video reasoning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29071–29082, 2025

  4. [4]

    Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding

    Shuo Cao, Ning Ma, Jiayang Li, Xiaolong Li, Ling Shao, Kaiwen Zhu, Yu Zhou, Yanfeng Wang, et al. Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

  5. [5]

    Composition and style attributes guided image aesthetic assessment.IEEE Transactions on Image Processing, 31: 5009–5024, 2022

    Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. Composition and style attributes guided image aesthetic assessment.IEEE Transactions on Image Processing, 31: 5009–5024, 2022

  6. [6]

    A survey on 3d gaussian splatting.ACM Computing Surveys,

    Guikun Chen and Wenguan Wang. A survey on 3d gaussian splatting.ACM Computing Surveys,

  7. [7]

    doi: 10.1145/3807511

    ISSN 0360-0300. doi: 10.1145/3807511. URL https://doi.org/10.1145/3807511

  8. [8]

    Mugsqa: Novel multi- uncertainty-based gaussian splatting quality assessment method, dataset, and benchmarks

    Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, and Weisi Lin. Mugsqa: Novel multi- uncertainty-based gaussian splatting quality assessment method, dataset, and benchmarks. In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11737–11741. IEEE, 2026

  9. [9]

    Point cloud self-supervised learning via 3d to multi-view masked learner

    Zhimin Chen, Xuewei Chen, Xiao Guo, Yingwei Li, Longlong Jing, Liang Yang, and Bing Li. Point cloud self-supervised learning via 3d to multi-view masked learner. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27618–27629, 2025

  10. [10]

    Deep learning based image aesthetic quality assessment- a review.ACM Computing Surveys, 57(7), February 2025

    Maedeh Daryanavard Chounchenani, Asadollah Shahbahrami, Reza Hassanpour, and Georgi Gaydadjiev. Deep learning based image aesthetic quality assessment- a review.ACM Computing Surveys, 57(7), February 2025. ISSN 0360-0300. doi: 10.1145/3716820. URL https: //doi.org/10.1145/3716820

  11. [11]

    Studying aesthetics in photographic images using a computational approach

    Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. Studying aesthetics in photographic images using a computational approach. InEuropean conference on computer vision, pages 288–301. Springer, 2006

  12. [12]

    Image aesthetic assessment: An experimental survey.IEEE Signal Processing Magazine, 34(4):80–106, 2017

    Yubin Deng, Chen Change Loy, and Xiaoou Tang. Image aesthetic assessment: An experimental survey.IEEE Signal Processing Magazine, 34(4):80–106, 2017. doi: 10.1109/MSP.2017. 2696576

  13. [13]

    Aesthetic-driven image enhancement by adversarial learning

    Yubin Deng, Chen Change Loy, and Xiaoou Tang. Aesthetic-driven image enhancement by adversarial learning. InProceedings of the 26th ACM international conference on Multimedia, pages 870–878, 2018. 10

  14. [14]

    A3gs: Arbitrary artistic style into arbitrary 3d gaussian splatting

    Zhiyuan Fang, Rengan Xie, Xuancheng Jin, Qi Ye, Wei Chen, Wenting Zheng, Rui Wang, and Yuchi Huo. A3gs: Arbitrary artistic style into arbitrary 3d gaussian splatting. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17751–17760, October 2025

  15. [15]

    3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics, 2024

    Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics, 2024

  16. [16]

    Rethinking image aesthetics assessment: Models, datasets and benchmarks

    Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. Rethinking image aesthetics assessment: Models, datasets and benchmarks. InProceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, pages 942–948, 2022. doi: 10.24963/ijcai.2022/132. URLhttps://doi.org/10.24963/ijcai.2022/132

  17. [17]

    Effective aesthetics prediction with multi- level spatially pooled features

    Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. Effective aesthetics prediction with multi- level spatially pooled features. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9375–9383, 2019

  18. [18]

    Aesexpert: Towards multi-modality foundation model for image aesthetics perception

    Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, and Guangming Shi. Aesexpert: Towards multi-modality foundation model for image aesthetics perception. InProceedings of the 32nd ACM International Conference on Multimedia, pages 5911–5920, 2024

  19. [19]

    Robust estimation of a location parameter

    Peter J Huber. Robust estimation of a location parameter. InBreakthroughs in statistics: Methodology and distribution, pages 492–518. Springer, 1992

  20. [20]

    Aesthetic attributes assessment of images

    Xin Jin, Le Wu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou, and Xinghui Zhou. Aesthetic attributes assessment of images. InProceedings of the 27th ACM international conference on multimedia, pages 311–319, 2019

  21. [21]

    Optimizing search engines using clickthrough data

    Thorsten Joachims. Optimizing search engines using clickthrough data. InProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, 2002

  22. [22]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

  23. [23]

    Vila: Learning image aesthetics from user comments with vision-language pretraining

    Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. Vila: Learning image aesthetics from user comments with vision-language pretraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10041–10051, 2023

  24. [24]

    The design of high-level features for photo quality assessment

    Yan Ke, Xiaoou Tang, and Feng Jing. The design of high-level features for photo quality assessment. In2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages 419–426. IEEE, 2006

  25. [25]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  26. [26]

    Tanks and temples: Bench- marking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4):78:1–78:13, 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Bench- marking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4):78:1–78:13, 2017

  27. [27]

    Photo aesthetics ranking network with attributes and content adaptation

    Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. Photo aesthetics ranking network with attributes and content adaptation. InEuropean Conference on Computer Vision (ECCV), pages 662–679. Springer, 2016

  28. [28]

    No-reference geometry quality assessment for colorless point clouds via list-wise rank learning.Computers & Graphics, 127: 104176, 2025

    Zheng Li, Bingxu Xie, Chao Chu, Weiqing Li, and Zhiyong Su. No-reference geometry quality assessment for colorless point clouds via list-wise rank learning.Computers & Graphics, 127: 104176, 2025. 11

  29. [29]

    Perceptual quality assessment of nerf and neural view synthesis methods for front-facing views

    Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, and Cengiz Öztireli. Perceptual quality assessment of nerf and neural view synthesis methods for front-facing views. InComputer Graphics Forum, volume 43, page e15036. Wiley Online Library, 2024

  30. [30]

    Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  31. [31]

    Rapid: Rating pictorial aesthetics using deep learning

    Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. Rapid: Rating pictorial aesthetics using deep learning. InProceedings of the 22nd ACM international conference on Multimedia, pages 457–466, 2014

  32. [32]

    Rating image aesthetics using deep learning.IEEE Transactions on Multimedia, 17(11):2021–2034, 2015

    Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. Rating image aesthetics using deep learning.IEEE Transactions on Multimedia, 17(11):2021–2034, 2015

  33. [33]

    User-guided personalized image aesthetic assessment based on deep reinforcement learning.IEEE Transactions on Multimedia, 25:736–749, 2021

    Pei Lv, Jianqi Fan, Xixi Nie, Weiming Dong, Xiaoheng Jiang, Bing Zhou, Mingliang Xu, and Changsheng Xu. User-guided personalized image aesthetic assessment based on deep reinforcement learning.IEEE Transactions on Multimedia, 25:736–749, 2021

  34. [34]

    Discovering beautiful attributes for aesthetic image analysis.International journal of computer vision, 113(3):246–266, 2015

    Luca Marchesotti, Naila Murray, and Florent Perronnin. Discovering beautiful attributes for aesthetic image analysis.International journal of computer vision, 113(3):246–266, 2015

  35. [35]

    Nerf view synthesis: Subjective quality assessment and objective metrics evaluation.IEEE Access, 13:26–41, 2024

    Pedro Martin, António Rodrigues, João Ascenso, and Maria Paula Queluz. Nerf view synthesis: Subjective quality assessment and objective metrics evaluation.IEEE Access, 13:26–41, 2024

  36. [36]

    Gs-qa: Compre- hensive quality assessment benchmark for gaussian splatting view synthesis

    Pedro Martin, António Rodrigues, João Ascenso, and Maria Paula Queluz. Gs-qa: Compre- hensive quality assessment benchmark for gaussian splatting view synthesis. In2025 17th International Conference on Quality of Multimedia Experience (QoMEX), pages 1–7. IEEE, 2025

  37. [37]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), 2020

  38. [38]

    Nerf: Representing scenes as neural radiance fields for view synthesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021

  39. [39]

    Ava: A large-scale database for aesthetic visual analysis

    Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012

  40. [40]

    Drop-in perceptual optimization for 3d gaussian splatting.arXiv preprint arXiv:2603.23297, 2026

    Ezgi Ozyilkan, Zhiqi Chen, Oren Rippel, Jona Ballé, and Kedar Tatwawadi. Drop-in perceptual optimization for 3d gaussian splatting.arXiv preprint arXiv:2603.23297, 2026

  41. [41]

    Distort-and-recover: Color enhancement using deep reinforcement learning

    Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. Distort-and-recover: Color enhancement using deep reinforcement learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5928–5936, 2018

  42. [42]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

  43. [43]

    Nerf-nqa: No- reference quality assessment for scenes generated by nerf and neural view synthesis methods

    Qiang Qu, Hanxue Liang, Xiaoming Chen, Yuk Ying Chung, and Yiran Shen. Nerf-nqa: No- reference quality assessment for scenes generated by nerf and neural view synthesis methods. IEEE Transactions on Visualization and Computer Graphics, 30(5):2129–2139, 2024

  44. [44]

    Nvs-sqa: Exploring self-supervised quality representation learning for neurally synthesized scenes without references.arXiv preprint arXiv:2501.06488, 2025

    Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, Weidong Cai, and Tongliang Liu. Nvs-sqa: Exploring self-supervised quality representation learning for neurally synthesized scenes without references.arXiv preprint arXiv:2501.06488, 2025. 12

  45. [45]

    Personalized image aesthetics

    Jian Ren, Xiaohui Shen, Zhe Lin, Radomir Mech, and David J Foran. Personalized image aesthetics. InProceedings of the IEEE international conference on computer vision, pages 638–647, 2017

  46. [46]

    Hierarchical layout-aware graph convolu- tional network for unified aesthetics assessment

    Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. Hierarchical layout-aware graph convolu- tional network for unified aesthetics assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8475–8484, 2021

  47. [47]

    Attention-based multi-patch aggregation for image aesthetic assessment

    Kekai Sheng, Weiming Dong, Chongyang Ma, Xing Mei, Feiyue Huang, and Bao-Gang Hu. Attention-based multi-patch aggregation for image aesthetic assessment. InProceedings of the 26th ACM international conference on Multimedia, pages 879–886, 2018

  48. [48]

    Nima: Neural image assessment.IEEE transactions on image processing, 27(8):3998–4011, 2018

    Hossein Talebi and Peyman Milanfar. Nima: Neural image assessment.IEEE transactions on image processing, 27(8):3998–4011, 2018

  49. [49]

    Perceptual quality assessment of 3d gaussian splatting: A subjective dataset and prediction metric

    Zhaolin Wan, Yining Diao, Jingqi Xu, Hao Wang, Zhiyang Li, Xiaopeng Fan, Wangmeng Zuo, and Debin Zhao. Perceptual quality assessment of 3d gaussian splatting: A subjective dataset and prediction metric. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 9657–9665, 2026

  50. [50]

    Of-nerf: A subjective benchmark of perceptual quality assessment for outward-facing nerf scenes with mul- tiple distortions and diverse viewing trajectories

    Qian Wang, Zongju Peng, Wenhui Zou, Fen Chen, Kai Xu, and Youshuang Zhao. Of-nerf: A subjective benchmark of perceptual quality assessment for outward-facing nerf scenes with mul- tiple distortions and diverse viewing trajectories. InProceedings of the 7th ACM International Conference on Multimedia in Asia, pages 1–7, 2025

  51. [51]

    Bilateral guided radiance field processing.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024

    Yuehao Wang, Chaoyi Wang, Bingchen Gong, and Tianfan Xue. Bilateral guided radiance field processing.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024

  52. [52]

    Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

    Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InProceedings of the IEEE/CVF international conference on computer vision, pages 20144–20154, 2023

  53. [53]

    Point transformer v3: Simpler faster stronger

    Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4840–4851, 2024

  54. [54]

    Sonata: Self-supervised learning of reliable point representations

    Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, and Julian Straub. Sonata: Self-supervised learning of reliable point representations. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22193–22204, 2025

  55. [55]

    3dgs-ieval- 15k: a large-scale image quality evaluation database for 3d gaussian-splatting

    Yuke Xing, Jiarui Wang, Peizhi Niu, Wenjie Huang, Guangtao Zhai, and Yiling Xu. 3dgs-ieval- 15k: a large-scale image quality evaluation database for 3d gaussian-splatting. InProceedings of the 33rd ACM International Conference on Multimedia, pages 12682–12689, 2025

  56. [56]

    Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024

  57. [57]

    Stochasticity-aware no-reference point cloud quality assessment

    Mingze Xu et al. Stochasticity-aware no-reference point cloud quality assessment. InProceed- ings of the International Joint Conference on Artificial Intelligence (IJCAI), 2025

  58. [58]

    Automatic photo adjustment using deep neural networks.ACM Transactions on Graphics (TOG), 35(2):1–15, 2016

    Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu. Automatic photo adjustment using deep neural networks.ACM Transactions on Graphics (TOG), 35(2):1–15, 2016

  59. [59]

    A benchmark for gaussian splatting compression and quality assessment study

    Qi Yang, Kaifa Yang, Yuke Xing, Yiling Xu, and Zhu Li. A benchmark for gaussian splatting compression and quality assessment study. InProceedings of the 6th ACM International Conference on Multimedia in Asia, pages 1–8, 2024. 13

  60. [60]

    Per- sonalized image aesthetics assessment with rich attributes

    Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, and Yandong Guo. Per- sonalized image aesthetics assessment with rich attributes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19861–19869, 2022

  61. [61]

    Video aesthetic quality assessment by temporal integration of photo-and motion-based features.IEEE transactions on multimedia, 15(8):1944–1957, 2013

    Hsin-Ho Yeh, Chun-Yu Yang, Ming-Sui Lee, and Chu-Song Chen. Video aesthetic quality assessment by temporal integration of photo-and motion-based features.IEEE transactions on multimedia, 15(8):1944–1957, 2013

  62. [62]

    Accelaes: Accelerat- ing diffusion transformers for training-free aesthetic-enhanced image generation.arXiv preprint arXiv:2603.12575, 2026

    Xuanhua Yin, Chuanzhi Xu, Haoxian Zhou, Boyu Wei, and Weidong Cai. Accelaes: Accelerat- ing diffusion transformers for training-free aesthetic-enhanced image generation.arXiv preprint arXiv:2603.12575, 2026

  63. [63]

    Stylizedgs: Controllable stylization for 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(12):11961–11973, 2025

    Dingxi Zhang, Yu-Jie Yuan, Zhuoxun Chen, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, and Lin Gao. Stylizedgs: Controllable stylization for 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(12):11961–11973, 2025. doi: 10.1109/ TPAMI.2025.3604010

  64. [64]

    Mm-pcqa: Multi-modal learning for no- reference point cloud quality assessment

    Qi Zhang, Yiling Li, Guangtao Zhai, and Kaifa Yang. Mm-pcqa: Multi-modal learning for no- reference point cloud quality assessment. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2023

  65. [65]

    Evaluating human perception of novel view synthesis: Subjective quality assessment of gaussian splatting and nerf in dynamic scenes.arXiv preprint arXiv:2501.08072, 2025

    Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian, and Lu Zhang. Evaluating human perception of novel view synthesis: Subjective quality assessment of gaussian splatting and nerf in dynamic scenes.arXiv preprint arXiv:2501.08072, 2025

  66. [66]

    and Ni, Z

    Hongbi Zhou and Zhangkai Ni. Perceptual-gs: Scene-adaptive perceptual densification for gaussian splatting.arXiv preprint arXiv:2506.12400, 2025

  67. [67]

    modality embedding

    Hancheng Zhu, Leida Li, Jinjian Wu, Sicheng Zhao, Guiguang Ding, and Guangming Shi. Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Transactions on Cybernetics, 52(3):1798–1811, 2020. 14 A Broader Impact Aes3D may support aesthetics-aware 3D content creation, quality control, rendering optimization, and ...