pith. sign in

arxiv: 2604.09022 · v1 · submitted 2026-04-10 · 💻 cs.CV

BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training

Pith reviewed 2026-05-10 18:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords synthetic datadiffusion models3D renderingimage captioningpath tracingdata generation pipelinemodel training
0
0 comments X

The pith

A pipeline using path tracing on 3D scenes produces synthetic image-caption data that trains diffusion models without visual inconsistencies or model collapse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BlendFusion as a method to generate large-scale synthetic training data for diffusion models by rendering images from 3D scenes. It addresses the problem of model collapse that occurs when models train on data generated by other diffusion models. The approach uses an object-centric camera placement to focus on objects, applies filters for quality, and generates captions automatically. This results in the FineBLEND dataset, which the authors compare favorably to existing image-caption collections in terms of quality and effectiveness of the camera strategy.

Core claim

By rendering images from diverse 3D scenes using path tracing with an object-centric camera placement strategy, robust filtering, and automatic captioning, we produce synthetic data that maintains visual consistency and supports effective diffusion model training without the autophagous feedback loop.

What carries the argument

Object-centric camera placement strategy combined with path tracing rendering, robust filtering, and automatic captioning to generate image-caption pairs from 3D scenes.

If this is right

  • Synthetic datasets can be created at scale from any collection of 3D models without needing real photographs.
  • Diffusion models trained on such data maintain performance without entering a feedback loop of degradation.
  • The object-centric strategy yields higher quality data than random camera sampling in the same scenes.
  • Community can use the open-source framework to build custom datasets tailored to specific domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could extend to generating data for other computer vision tasks beyond diffusion models.
  • Reducing dependence on scraped internet images might improve data privacy and reduce copyright concerns in model training.
  • If 3D scenes are procedurally generated, the approach might allow infinite data variety.

Load-bearing premise

That the synthetic images produced by path tracing 3D scenes capture enough visual variety and realism to train models that perform well on real-world images.

What would settle it

If a diffusion model trained solely on the FineBLEND dataset generates images with the same visual artifacts and quality degradation as one trained on pure diffusion outputs, that would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.09022 by Suguna Varshini Velury, Thejas Venkatesh.

Figure 1
Figure 1. Figure 1: Images generated by BlendFusion for different scenes [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The BlendFusion Pipeline Synthetic data from 3D scenes. An alternative approach is to generate training data from 3D assets and scenes using physically based rendering. Graphics-driven pipelines such as BlenderProc [7] and Kubric [13] enable scalable simu￾lation, rendering, and annotation of synthetic visual data. Large repositories of 3D assets, including Objaverse [4] and Objaverse-XL [5], further enable… view at source ↗
Figure 3
Figure 3. Figure 3: Scene composition for the BlendFusion and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison between BlendFusion path-traced renders (top) and SDXL-generated images (bottom) conditioned on the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit visual inconsistencies, and training models on such data can create an autophagous feedback loop that leads to model collapse, commonly referred to as Model Autophagy Disorder (MAD). To address these challenges, we propose BlendFusion, a scalable framework for synthetic data generation from 3D scenes using path tracing. Our pipeline incorporates an object-centric camera placement strategy, robust filtering mechanisms, and automatic captioning to produce high-quality image-caption pairs. Using this pipeline, we curate FineBLEND, an image-caption dataset constructed from a diverse set of 3D scenes. We empirically analyze the quality of FineBLEND and compare it to several widely used image-caption datasets. We also demonstrate the effectiveness of our object-centric camera placement strategy relative to object-agnostic sampling approaches. Our open-source framework is designed for high configurability, enabling the community to create their own datasets from 3D scenes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes BlendFusion, a scalable framework for generating synthetic image-caption pairs from diverse 3D scenes using path tracing. The pipeline includes an object-centric camera placement strategy, robust filtering, and automatic captioning to curate the FineBLEND dataset. It reports empirical quality analysis comparing FineBLEND to existing datasets (e.g., LAION, COCO) and demonstrates the camera strategy's effectiveness via ablation against object-agnostic sampling. The open-source framework is presented as configurable for community use in diffusion model training.

Significance. If the quality claims hold under rigorous metrics, BlendFusion could provide a practical, non-autophagous source of synthetic data that mitigates visual inconsistencies in diffusion training pipelines. The open-source release and configurability add value for reproducibility. However, the absence of any diffusion model training experiments or iterative generation tests means the work primarily contributes a data generation method rather than a validated solution to Model Autophagy Disorder.

major comments (2)
  1. [Abstract and §1] Abstract and §1 (motivation): The paper frames BlendFusion as addressing visual inconsistencies and the MAD autophagous feedback loop in diffusion models, yet no experiments train diffusion models on FineBLEND, generate new images iteratively, or report degradation/consistency metrics across iterations. This leaves the central claim that the 3D-rendered data prevents the feedback loop as an untested assumption rather than a demonstrated result.
  2. [Empirical analysis section] Empirical analysis section (quality comparisons): The abstract states that FineBLEND is compared to widely used datasets and that the object-centric strategy is shown effective, but no specific metrics (e.g., FID, CLIP scores, caption accuracy), baselines, exclusion criteria, or statistical details are referenced; without these, the superiority claims cannot be verified and risk being undermined by unshown post-hoc selection.
minor comments (1)
  1. [Methods] The manuscript should explicitly state the number of 3D scenes, path-tracing parameters (samples, bounces), and filtering thresholds to enable exact reproduction of FineBLEND.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the open-source framework and configurability. We address each major comment below, clarifying the intended scope of the work while committing to revisions that improve transparency and precision without misrepresenting the manuscript's contributions.

read point-by-point responses
  1. Referee: [Abstract and §1] Abstract and §1 (motivation): The paper frames BlendFusion as addressing visual inconsistencies and the MAD autophagous feedback loop in diffusion models, yet no experiments train diffusion models on FineBLEND, generate new images iteratively, or report degradation/consistency metrics across iterations. This leaves the central claim that the 3D-rendered data prevents the feedback loop as an untested assumption rather than a demonstrated result.

    Authors: We agree that the manuscript does not include diffusion model training experiments, iterative image generation, or direct metrics on degradation across iterations. The central motivation in the abstract and §1 is that path-traced data from diverse 3D scenes can serve as a non-autophagous source to help mitigate visual inconsistencies and MAD, grounded in the observation that such renders avoid the distributional artifacts of purely generative pipelines. However, our contribution is the scalable generation framework, the FineBLEND dataset, and the empirical quality analysis plus camera-placement ablation. We will revise the abstract and §1 to explicitly state that BlendFusion provides a method for producing high-quality data with the potential to address these issues, framing the MAD-related benefits as a motivating hypothesis supported by the data characteristics rather than a directly validated outcome. revision: partial

  2. Referee: [Empirical analysis section] Empirical analysis section (quality comparisons): The abstract states that FineBLEND is compared to widely used datasets and that the object-centric strategy is shown effective, but no specific metrics (e.g., FID, CLIP scores, caption accuracy), baselines, exclusion criteria, or statistical details are referenced; without these, the superiority claims cannot be verified and risk being undermined by unshown post-hoc selection.

    Authors: We acknowledge that greater explicitness is needed. The empirical analysis section does present quality comparisons against datasets such as LAION and COCO and an ablation of the object-centric camera strategy versus object-agnostic sampling, but we agree that the current text does not sufficiently detail the exact quantitative metrics (e.g., FID, CLIP similarity), caption accuracy evaluation protocol, exclusion/filtering criteria, or statistical reporting. We will revise the section to add a summary table and accompanying text that lists the specific metrics, baselines, filtering rules, and any statistical details used, thereby making the comparisons fully verifiable and addressing concerns about post-hoc selection. revision: yes

standing simulated objections not resolved
  • The absence of diffusion model training experiments or iterative generation tests means the direct claim that FineBLEND prevents the MAD feedback loop cannot be empirically demonstrated within the current manuscript scope.

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper proposes a pipeline for generating image-caption pairs from path-traced 3D scenes (object-centric camera placement, filtering, auto-captioning) and curates the FineBLEND dataset, followed by static empirical comparisons to existing datasets like LAION and COCO plus a camera-placement ablation. No equations, derivations, predictions, or first-principles results appear in the manuscript. Claims rest entirely on the described methodology and reported metrics rather than any reduction to fitted parameters, self-definitions, or self-citation chains, satisfying the criteria for a self-contained non-circular contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. Path tracing is treated as a standard rendering method from prior graphics literature.

pith-pipeline@v0.9.0 · 5489 in / 1215 out tokens · 45406 ms · 2026-05-10T18:00:17.423994+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    Baraniuk

    Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, and Richard G. Baraniuk. Self-consuming generative models go mad, 2023. 1, 2, 7

  2. [2]

    8 Baraniuk

    Sina Alemohammad, Zhangyang Wang, and Richard G. 8 Baraniuk. Neon: Negative extrapolation from self-training improves image generation, 2025. 2

  3. [3]

    Dragon: A large-scale dataset of realistic images generated by diffusion models, 2025

    Giulia Bertazzini, Daniele Baracchi, Dasara Shullani, Isao Echizen, and Alessandro Piva. Dragon: A large-scale dataset of realistic images generated by diffusion models, 2025. 1, 2

  4. [4]

    Objaverse: A universe of annotated 3d objects, 2022

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects, 2022. 3

  5. [5]

    Objaverse-xl: A universe of 10m+ 3d objects, 2023

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, Eli VanderBilt, Anirud- dha Kembhavi, Carl V ondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-xl: A universe of 10m+ 3d objects, 2023. 3

  6. [6]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 3

  7. [7]

    Blender- proc, 2019

    Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Youssef Zidan, Dmitry Olefir, Mohamad El- badrawy, Ahsan Lodhi, and Harinandan Katam. Blender- proc, 2019. 3

  8. [8]

    The farthest point strategy for progressive image sampling.IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, 6:1305– 15, 1997

    Yuval Eldar, Michael Lindenbaum, Moshe Porat, and Yehoshua Zeevi. The farthest point strategy for progressive image sampling.IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, 6:1305– 15, 1997. 5

  9. [9]

    Multi-modal synthetic data training and model collapse: Insights from vlms and diffusion models

    Yongqing Fan et al. Multi-modal synthetic data training and model collapse: Insights from vlms and diffusion models. arXiv preprint arXiv:2505.08803, 2025. 7

  10. [10]

    Unreal engine sun temple, open research content archive (orca), 2017

    Epic Games. Unreal engine sun temple, open research content archive (orca), 2017. http://developer.nvidia.com/orca/epic-games-sun-temple. 5

  11. [11]

    Uncurated image-text datasets: Shedding light on demographic bias, 2023

    Noa Garcia, Yusuke Hirota, Yankun Wu, and Yuta Nakashima. Uncurated image-text datasets: Shedding light on demographic bias, 2023. 1

  12. [12]

    Mandeep Goyal and Qusay H. Mahmoud. A systematic re- view of synthetic data generation techniques using genera- tive ai.Electronics, 13(17), 2024. 1

  13. [13]

    Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapra- gasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cen- giz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi...

  14. [14]

    Transi- tioning from real to synthetic data: Quantifying the bias in model, 2021

    Aman Gupta, Deepak Bhatt, and Anubha Pandey. Transi- tioning from real to synthetic data: Quantifying the bias in model, 2021. 1

  15. [15]

    Clipscore: A reference-free evaluation met- ric for image captioning, 2022

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning, 2022. 5

  16. [16]

    Denoising diffu- sion probabilistic models, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models, 2020. 2

  17. [17]

    Releasing re-laion-5b: transparent iteration on laion-5b with additional safety fixes.https://laion

    LAION. Releasing re-laion-5b: transparent iteration on laion-5b with additional safety fixes.https://laion. ai/blog/relaion-5b/, 2024. Accessed: 30 aug, 2024. 1, 6

  18. [18]

    Lawrence Zitnick, and Piotr Doll ´ar

    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll ´ar. Microsoft coco: Common objects in context, 2015. 1, 3, 6

  19. [19]

    Amazon lumberyard bistro, open research content archive (orca), 2017

    Amazon Lumberyard. Amazon lumberyard bistro, open research content archive (orca), 2017. http://developer.nvidia.com/orca/amazon-lumberyard-bistro. 5

  20. [20]

    Scalable 3d captioning with pretrained models, 2023

    Tiange Luo, Chris Rockwell, Honglak Lee, and Justin John- son. Scalable 3d captioning with pretrained models, 2023. 3

  21. [21]

    View selec- tion for 3d captioning via diffusion ranking, 2025

    Tiange Luo, Justin Johnson, and Honglak Lee. View selec- tion for 3d captioning via diffusion ranking, 2025. 3

  22. [22]

    Improved denoising dif- fusion probabilistic models, 2021

    Alex Nichol and Prafulla Dhariwal. Improved denoising dif- fusion probabilistic models, 2021. 2

  23. [23]

    Nvidia emerald square, open research content archive (orca), 2017

    Kate Anderson Nicholas Hull and Nir Benty. Nvidia emerald square, open research content archive (orca), 2017. http://developer.nvidia.com/orca/nvidia-emerald-square. 5

  24. [24]

    Dinov2: Learning robust visual features with- out supervision, 2024

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

  25. [25]

    The consis- tency critic: Correcting inconsistencies in generated images via reference-guided attentive alignment, 2025

    Ziheng Ouyang, Yiren Song, Yaoli Liu, Shihao Zhu, Qibin Hou, Ming-Ming Cheng, and Mike Zheng Shou. The consis- tency critic: Correcting inconsistencies in generated images via reference-guided attentive alignment, 2025. 1

  26. [26]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023. 7

  27. [27]

    Learning transferable visual models from natural language supervision, 2021

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. 5

  28. [28]

    Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun

    Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games, 2016. 1, 8

  29. [29]

    High-resolution image syn- thesis with latent diffusion models, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models, 2022. 1, 2

  30. [30]

    German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. The synthia dataset: A large collection of synthetic images for semantic segmenta- tion of urban scenes. In2016 IEEE Conference on Computer 9 Vision and Pattern Recognition (CVPR), pages 3234–3243,

  31. [31]

    Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022. 1

  32. [32]

    Using dif- fusion models to generate synthetic labelled data for medical image segmentation, 2024

    Daniel Saragih, Atsuhiro Hibi, and Pascal Tyrrell. Using dif- fusion models to generate synthetic labelled data for medical image segmentation, 2024. 1

  33. [33]

    Laion-aesthetics.https : / / laion.ai/blog/laion-aesthetics/, 2022

    Christoph Schuhmann. Laion-aesthetics.https : / / laion.ai/blog/laion-aesthetics/, 2022. 5

  34. [34]

    Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text model...

  35. [35]

    Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning

    Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning. InPro- ceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565. Association for Computational Linguis- tics, 2018. 6

  36. [36]

    The curse of recur- sion: Training on generated data makes models forget, 2024

    Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson. The curse of recur- sion: Training on generated data makes models forget, 2024. 1

  37. [37]

    Qwen3 technical report, 2025

    Qwen Team. Qwen3 technical report, 2025. 4

  38. [38]

    Falling things: A synthetic dataset for 3d object detection and pose estimation, 2018

    Jonathan Tremblay, Thang To, and Stan Birchfield. Falling things: A synthetic dataset for 3d object detection and pose estimation, 2018. 1

  39. [39]

    Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau

    Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. Diffu- siondb: A large-scale prompt gallery dataset for text-to- image generative models, 2023. 2

  40. [40]

    Datasetdm: Synthesizing data with perception anno- tations using diffusion models, 2023

    Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, and Chunhua Shen. Datasetdm: Synthesizing data with perception anno- tations using diffusion models, 2023. 2

  41. [41]

    GOOD: <brief factual reason>

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 8 10 11 A. Prompts VLM Filtering Prompt You are evaluating a low-resolution synthetic render for a captioning dataset. Decide if the image is GOOD or BAD. Output format (exactly one line): - "GOOD: <brief factual reason>" - "BAD: <brief factual...

  42. [42]

    - BAD if the frame is dominated by a single surface/part (e.g., cheek, wall, texture) without context

    EXTREME CROP / CLOSE-UP - BAD if the view is an extreme close-up or partial fragment such that the subject cannot be confidently named. - BAD if the frame is dominated by a single surface/part (e.g., cheek, wall, texture) without context. - BAD if >30% of the subject is cut off OR the crop removes key identifying parts (e.g., head missing, face half missi...

  43. [43]

    IDENTIFIABILITY FAILURE - BAD if you cannot identify WHAT it is (object type OR scene type) in one short noun phrase

  44. [44]

    RENDER / SYNTHETIC ERRORS - BAD if obvious rendering artifacts exist: clipping/interpenetration, broken geometry, missing textures/materials, NaN/black patches, fireflies/bright speckles, extreme distortion

  45. [45]

    close-up of a clock face

    VISIBILITY FAILURE - BAD if too dark/bright/blurred/noisy to recognize major shapes and boundaries. - BAD if mostly blank/black/solid color. FRAMING RULES - Object-centric GOOD only if the full object OR a clearly intentional, informative partial view is shown. (Example acceptable partial: "close-up of a clock face" where it is clearly a clock.) - Scene-c...