Designing streetscapes from street-view imagery using diffusion models

Chang Zhao; Kailai Sun; Lingqian Hu; Qingqi Song; Shenhao Wang; Yuebing Liang; Yuzhou Chen

arxiv: 2605.17527 · v1 · pith:GRDTHSS7new · submitted 2026-05-17 · 💻 cs.CV

Designing streetscapes from street-view imagery using diffusion models

Yuzhou Chen , Yuebing Liang , Lingqian Hu , Kailai Sun , Qingqi Song , Chang Zhao , Shenhao Wang This is my paper

Pith reviewed 2026-05-20 13:39 UTC · model grok-4.3

classification 💻 cs.CV

keywords street-view imagerydiffusion modelsurban planninggenerative AImultimodal datasetstreetscape generationsemantic consistencyvisual controls

0 comments

The pith

Diffusion models synthesize realistic alternative streetscapes conditioned on visual metrics and text from street-view data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first builds a multimodal dataset that pairs street-view images from Chicago and Orlando with textual descriptions, segmentation maps, road masks, and numerical metrics for elements such as greenery or building coverage. It then trains diffusion models on this data to generate new streetscape images that respond to both written prompts and visual controls like target metrics. The work aims to move urban analysis beyond measuring existing conditions toward creating and exploring hypothetical designs. If the models hold up, planners gain a direct visual tool for testing alternative urban scenarios while retaining control over key quantitative indicators.

Core claim

We construct a multimodal dataset aligning street-view imagery with textual descriptions, segmentation maps, road masks, and quantitative visual metrics in Chicago and Orlando. Using this dataset, diffusion models produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Incorporating visual controls improves semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. Overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago as measured by mIoU, with class-wise gains exceeding 100% for some building view indices. When textual and visual controls conflict, the 1

What carries the argument

Generative multimodal AI framework using diffusion models conditioned on textual descriptions, segmentation maps, road masks, and quantitative visual metrics to synthesize alternative streetscapes.

If this is right

Visual controls improve semantic consistency while preserving global realism in the generated images.
Semantic consistency rises substantially, by 23.7 percent in Orlando and 46.4 percent in Chicago according to mIoU scores.
Class-specific improvements exceed 100 percent for certain categories such as building view indices.
Imagery controls dominate textual controls during conflicts, revealing a control hierarchy.
Streetscape generation can be directed in fine-grained ways by combining or prioritizing text and visual prompts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied to additional cities to create location-specific urban design simulators.
Generated scenes might be fed into traffic or environmental models to forecast effects of proposed changes on mobility or heat.
The observed dominance of visual controls points toward developing stronger metric-based conditioning for planning use cases.
Future tests could examine whether the same models support sequential generation showing how streetscapes evolve over time or under different climate conditions.

Load-bearing premise

The multimodal dataset accurately aligns street-view images with textual descriptions, segmentation maps, road masks, and quantitative visual metrics without significant alignment errors or biases.

What would settle it

Compare generated images against held-out real street views to check whether outputs match the input visual metrics, for example by measuring if a high green-view-index condition produces measurably higher greenery scores in the resulting images.

Figures

Figures reproduced from arXiv: 2605.17527 by Chang Zhao, Kailai Sun, Lingqian Hu, Qingqi Song, Shenhao Wang, Yuebing Liang, Yuzhou Chen.

**Figure 2.** Figure 2: ControlNet architecture As shown by [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Streetscape generation with Stable Diffusion and ControlNet [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluating the consistency between textual prompts and generated imagery. Histograms: Density [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Increasing the proportion of green and sky view indices without changing the proportion of others [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Increasing the proportion of tree view index while reducing the proportions of sky and buildings [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Generating street-view image with road mask and context-aware controls. (a) Context specified [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Streetscape generation by contrasting imagery and textual controls. The rows show generated [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Street-view imagery (SVI) is widely used to quantify key indicators of urban environment, such as green- ery, sky, or road view indices. However, existing studies largely focus on measuring current streetscapes and rarely support the generation of alternative and non-existing urban scenarios, which is a core task in geospatial disciplines such as urban planning and design. To address this gap, we propose a gener- ative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics, enabling direct visual exploration of urban scenarios. We first construct a multimodal dataset that aligns SVIs with textual descriptions, segmentation maps, road masks, and quantitative metrics of visual elements in Chicago and Orlando. Using this dataset, we demonstrate that diffusion models can produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Our quantitative evaluations show that incorporating visual controls can improve semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. In addition, overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago, as measured by the mIoU index, with class-wise gains exceeding even 100% improvement for building view indices. Streetscape generation can be controlled in a fine-grained manner by both visual and textual prompts, and when textual and visual controls conflict, imagery controls consistently dominate, indicating a clear control hierarchy and the importance of further developing visual controls for urban scene generation. Overall, this work establishes an important benchmark for streetscape generation us- ing SVIs and diffusion models, and illustrates how generative AI can serve as a practical, scalable, and controllable approach for urban scenario exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies diffusion models to generate alternative streetscapes conditioned on SVI metrics and text, with reported gains in consistency, but the unvalidated dataset alignments are the weakest link.

read the letter

The main point is that this work takes diffusion models and applies them to create new, non-existing street views from existing street-view imagery, conditioned on things like greenery levels or building visibility plus text prompts. It shifts from the usual measurement of current urban scenes to generating alternatives for planning use, which is a practical step forward in that space. They build a multimodal dataset pairing Chicago and Orlando SVIs with text descriptions, segmentation maps, road masks, and quantitative visual metrics, then show the models can respond to both text and imagery controls. Visual controls appear stronger when they conflict with text, and the numbers show LPIPS dropping about 6% with visual controls added, plus mIoU gains of 23.7% in Orlando and 46.4% in Chicago, with some class improvements over 100% for buildings. That control hierarchy and the benchmark setup are the useful parts. The soft spot is the dataset construction. The alignments between images, texts, segmentations, and metrics are central to every result, yet the abstract gives no details on how they were done, what error rates look like, or any validation steps like manual checks. If automated tools introduced mismatches in segmentations or metric values, the reported improvements could partly reflect data quirks rather than genuine conditioning power. This matters because the whole claim rests on reliable input signals. The paper is for urban planners, designers, and CV researchers working on generative tools for scenario exploration. A reader focused on real-world applications of diffusion models would find the control experiments and quantitative comparisons worth looking at. It deserves peer review because the idea is scoped clearly and the surface results are concrete, even though the data side needs more scrutiny to hold up.

Referee Report

2 major / 2 minor

Summary. The paper constructs a multimodal dataset aligning street-view images (SVIs) from Chicago and Orlando with textual descriptions, segmentation maps, road masks, and quantitative visual metrics. It then applies diffusion models to generate alternative streetscapes conditioned on these inputs, reporting that visual controls improve semantic consistency (LPIPS reduced by ~6%, mIoU increased by 23.7% in Orlando and 46.4% in Chicago, with some class-wise gains >100%), that imagery controls dominate text in conflicts, and that this provides a benchmark for controllable urban scenario generation.

Significance. If the dataset alignments and results hold, the work offers a practical benchmark for applying diffusion models to geospatial urban planning tasks, enabling visual exploration of non-existing streetscapes via multimodal conditioning and highlighting a control hierarchy favoring imagery over text.

major comments (2)

[Dataset Construction] The multimodal dataset construction (described in the abstract and Methods) is load-bearing for all claims, yet provides no details on alignment procedures between SVIs, texts, segmentations, road masks, and metrics, nor on error rates, validation (e.g., manual review or consistency checks), or potential biases from automated tools. Without this, the reported LPIPS reduction and mIoU gains could arise from data properties rather than conditioning strength.
[Quantitative Evaluations] §4 (Quantitative Evaluations): while overall mIoU gains and class-wise improvements (e.g., building view indices) are reported, the manuscript lacks error analysis, failure cases, or statistical significance testing for the ~6% LPIPS and 23.7–46.4% mIoU improvements, making it difficult to assess robustness of the semantic consistency claims.

minor comments (2)

[Abstract] The abstract notes that 'imagery controls consistently dominate' but does not specify the exact protocol for testing conflicting prompts; adding this detail would clarify the control hierarchy result.
[Methods] Training details for the diffusion models (e.g., architecture, conditioning implementation, hyperparameters) are referenced but not fully elaborated, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to improve the clarity and robustness of our work.

read point-by-point responses

Referee: [Dataset Construction] The multimodal dataset construction (described in the abstract and Methods) is load-bearing for all claims, yet provides no details on alignment procedures between SVIs, texts, segmentations, road masks, and metrics, nor on error rates, validation (e.g., manual review or consistency checks), or potential biases from automated tools. Without this, the reported LPIPS reduction and mIoU gains could arise from data properties rather than conditioning strength.

Authors: We agree that more details on the dataset construction are necessary to fully support our claims. In the revised manuscript, we will add a dedicated subsection in the Methods describing the alignment procedures in detail. This will include the sources of the SVIs, how textual descriptions were generated and aligned, the models used for segmentation maps and road masks, and the formulas for computing the quantitative visual metrics. Additionally, we will report validation procedures such as manual review of a random sample of 500 alignments, estimated alignment error rates, and a discussion of potential biases from the automated tools (e.g., segmentation model inaccuracies in urban scenes). These additions will help demonstrate that the observed improvements stem from the multimodal conditioning rather than underlying data characteristics. revision: yes
Referee: [Quantitative Evaluations] §4 (Quantitative Evaluations): while overall mIoU gains and class-wise improvements (e.g., building view indices) are reported, the manuscript lacks error analysis, failure cases, or statistical significance testing for the ~6% LPIPS and 23.7–46.4% mIoU improvements, making it difficult to assess robustness of the semantic consistency claims.

Authors: We acknowledge the need for more rigorous quantitative analysis. In the revised version, we will enhance §4 by including error analysis through reporting standard deviations or confidence intervals for the LPIPS and mIoU metrics based on multiple evaluation runs. We will also add a subsection on failure cases, providing qualitative examples of generated images where semantic consistency is not achieved and discussing potential reasons. Furthermore, we will conduct and report statistical significance tests, such as Wilcoxon signed-rank tests or t-tests, to evaluate the significance of the reported improvements. These changes will allow readers to better assess the robustness of our semantic consistency claims. revision: yes

Circularity Check

0 steps flagged

No circularity: standard diffusion conditioning on newly constructed multimodal dataset

full rationale

The paper's derivation consists of constructing a multimodal dataset aligning SVIs with texts, segmentations, masks and metrics, followed by training and evaluating standard diffusion models under textual and visual controls. Quantitative results (LPIPS reduction, mIoU gains) are obtained from empirical comparisons on held-out generations rather than any fitted parameter being renamed as a prediction or any self-referential definition. No equations, uniqueness theorems, or ansatzes are shown to reduce to prior self-citations or inputs by construction; the central claim therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard diffusion model properties for multimodal conditioning and the assumption that the custom dataset provides reliable alignment between images and control signals; no new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption Diffusion models can be effectively conditioned on multimodal inputs (text and imagery) to produce semantically consistent outputs for urban scenes.
Invoked in the demonstration that models respond to both textual and imagery controls while maintaining realism.

pith-pipeline@v0.9.0 · 5847 in / 1267 out tokens · 60266 ms · 2026-05-20T13:39:32.421334+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a generative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics... diffusion models can produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Quantitative evaluations show that incorporating visual controls can improve semantic consistency, reducing the LPIPS index by approximately 6%...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 3 internal anchors

[1]

2013 , publisher=

The new science of cities , author=. 2013 , publisher=

work page 2013
[2]

2013 , publisher=

Cities for people , author=. 2013 , publisher=

work page 2013
[3]

Landscape and Urban Planning , author =

Landscape value in urban neighborhoods:. Landscape and Urban Planning , author =. 2022 , pages =. doi:10.1016/j.landurbplan.2022.104357 , language =

work page doi:10.1016/j.landurbplan.2022.104357 2022
[4]

Landscape and Urban Planning , author =

Evaluating the subjective perceptions of streetscapes using street-view images , volume =. Landscape and Urban Planning , author =. 2024 , pages =. doi:10.1016/j.landurbplan.2024.105073 , language =

work page doi:10.1016/j.landurbplan.2024.105073 2024
[5]

Computers, Environment and Urban Systems , author =

Quantifying seasonal bias in street view imagery for urban form assessment:. Computers, Environment and Urban Systems , author =. 2025 , note =

work page 2025
[6]

Landscape and urban planning , volume=

Street view imagery in urban analytics and GIS: A review , author=. Landscape and urban planning , volume=. 2021 , publisher=

work page 2021
[7]

Buildings , volume=

Street View Imagery (SVI) in the built environment: A theoretical and systematic review , author=. Buildings , volume=. 2022 , publisher=

work page 2022
[8]

Landscape and Urban Planning , volume=

Can you see green? Assessing the visibility of urban forests in cities , author=. Landscape and Urban Planning , volume=. 2009 , publisher=

work page 2009
[9]

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

What is that building? an end-to-end system for building recognition from streetside images , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

work page
[10]

Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

Deep learning for automatically detecting sidewalk accessibility problems using streetscape imagery , author=. Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

work page
[11]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page
[12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[13]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Layoutdm: Discrete diffusion model for controllable layout generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[14]

2022 , keywords =

Environment and Planning B: Urban Analytics and City Science , author =. 2022 , keywords =. doi:10.1177/23998083211023516 , language =

work page doi:10.1177/23998083211023516 2022
[15]

2022 , keywords =

International Journal of Geographical Information Science , author =. 2022 , keywords =. doi:10.1080/13658816.2022.2041643 , language =

work page doi:10.1080/13658816.2022.2041643 2022
[16]

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

work page
[17]

, author=

Methods in environmental and behavioral research. , author=. 1987 , publisher=

work page 1987
[18]

Journal of Planning Literature , volume=

Virtual reality and urban simulation in planning: A literature review and topical bibliography , author=. Journal of Planning Literature , volume=. 2001 , publisher=

work page 2001
[19]

2001 , publisher=

Visualizing the city: communicating urban design to planners and decision-makers , author=. 2001 , publisher=

work page 2001
[20]

Environment and Planning B: Urban Analytics and City Science , volume=

In search of visualization challenges: The development and implementation of visualization tools for supporting dialogue in urban planning processes , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2017 , publisher=

work page 2017
[21]

Landscape and urban planning , volume=

Guidance for crystal ball gazers: developing a code of ethics for landscape visualization , author=. Landscape and urban planning , volume=. 2001 , publisher=

work page 2001
[22]

2018 , publisher=

Inventing future cities , author=. 2018 , publisher=

work page 2018
[23]

2012 , publisher=

A framework for geodesign: Changing geography by design , author=. 2012 , publisher=

work page 2012
[24]

Journal of Urban Mobility , volume=

Creating visualizations using generative AI to guide decision-making in street designs: A viewpoint , author=. Journal of Urban Mobility , volume=. 2025 , publisher=

work page 2025
[25]

Landscape and Urban Planning , volume=

Generative AI text-to-image for community participation in landscape planning , author=. Landscape and Urban Planning , volume=. 2025 , publisher=

work page 2025
[26]

International journal of environmental research and public health , volume=

How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index , author=. International journal of environmental research and public health , volume=. 2018 , publisher=

work page 2018
[27]

Building and Environment , volume=

Mapping sky, tree, and building view factors of street canyons in a high-density urban environment , author=. Building and Environment , volume=. 2018 , publisher=

work page 2018
[28]

International journal of environmental research and public health , volume=

A systematic measurement of street quality through multi-sourced urban data: A human-oriented analysis , author=. International journal of environmental research and public health , volume=. 2019 , publisher=

work page 2019
[29]

Environment and Planning B: Urban Analytics and City Science , volume=

Examining the spatial distribution and temporal change of the green view index in New York City using Google Street View images and deep learning , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2021 , publisher=

work page 2021
[30]

Landscape and Urban Planning , volume=

Analyzing the effects of Green View Index of neighborhood streets on walking time using Google Street View and deep learning , author=. Landscape and Urban Planning , volume=. 2021 , publisher=

work page 2021
[31]

Sustainable Cities and Society , author =

Accessing eye-level greenness visibility from open-source street view images:. Sustainable Cities and Society , author =. 2024 , pages =. doi:10.1016/j.scs.2024.105262 , language =

work page doi:10.1016/j.scs.2024.105262 2024
[32]

Landscape and Urban Planning , author =

Using. Landscape and Urban Planning , author =. 2019 , pages =. doi:10.1016/j.landurbplan.2018.08.029 , language =

work page doi:10.1016/j.landurbplan.2018.08.029 2019
[33]

Sustainable Cities and Society , author =

Social inequalities in neighborhood visual walkability:. Sustainable Cities and Society , author =. 2019 , pages =. doi:10.1016/j.scs.2019.101605 , language =

work page doi:10.1016/j.scs.2019.101605 2019
[34]

Landscape and Urban Planning , author =

Quantifying the shade provision of street trees in urban landscape:. Landscape and Urban Planning , author =. 2018 , pages =. doi:10.1016/j.landurbplan.2017.08.011 , language =

work page doi:10.1016/j.landurbplan.2017.08.011 2018
[35]

Urban Climate , volume=

Heat vulnerability and street-level outdoor thermal comfort in the city of Houston: Application of Google street view image derived SVFs , author=. Urban Climate , volume=. 2023 , publisher=

work page 2023
[36]

Landscape and Urban Planning , volume=

Quantification through deep learning of sky view factor and greenery on urban streets during hot and cool seasons , author=. Landscape and Urban Planning , volume=. 2023 , publisher=

work page 2023
[37]

Urban Climate , volume=

Sky view factor estimation from street view images based on semantic segmentation , author=. Urban Climate , volume=. 2021 , publisher=

work page 2021
[38]

American Economic Review , volume=

Cities are physical too: Using computer vision to measure the quality and impact of urban appearance , author=. American Economic Review , volume=. 2016 , publisher=

work page 2016
[39]

Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=

Deep learning the city: Quantifying urban perception at a global scale , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=. 2016 , organization=

work page 2016
[40]

Transactions in GIS , volume=

Investigating the association between streetscapes and human walking activities using Google Street View and human trajectory data , author=. Transactions in GIS , volume=. 2018 , publisher=

work page 2018
[41]

Sustainable Cities and Society , author =

Perceiving the fine-scale urban poverty using street view images through a vision-language model , volume =. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106267 , language =

work page doi:10.1016/j.scs.2025.106267 2025
[42]

Sustainable Cities and Society , volume=

Assessing spatiotemporal characteristics of urban heat islands from the perspective of an urban expansion and green infrastructure , author=. Sustainable Cities and Society , volume=. 2021 , publisher=

work page 2021
[43]

Sustainable Cities and Society , author =

Measuring visual walkability perception using panoramic street view images, virtual reality, and deep learning , volume =. Sustainable Cities and Society , author =. 2022 , pages =. doi:10.1016/j.scs.2022.104140 , language =

work page doi:10.1016/j.scs.2022.104140 2022
[44]

Sustainable Cities and Society , author =

An innovative framework combining a. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106384 , language =

work page doi:10.1016/j.scs.2025.106384 2025
[45]

Scientific Reports , volume=

Measuring social, environmental and health inequalities using deep learning and street imagery , author=. Scientific Reports , volume=. 2019 , doi=

work page 2019
[46]

Landscape and Urban Planning , author =

Predicting perceptions of the built environment using. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104257 , language =

work page doi:10.1016/j.landurbplan.2021.104257 2021
[47]

International Journal of Geographical Information Science , author =

Identifying urban villages: an attention-based deep learning approach that integrates remote sensing and street-level images , volume =. International Journal of Geographical Information Science , author =. 2025 , pages =. doi:10.1080/13658816.2024.2442096 , language =

work page doi:10.1080/13658816.2024.2442096 2025
[48]

2013 , doi =

Spatial. 2013 , doi =

work page 2013
[49]

Landscape and Urban Planning , author =

Street view imagery in urban analytics and. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104217 , language =

work page doi:10.1016/j.landurbplan.2021.104217 2021
[50]

Proceedings of the 32nd

Han, Zhenyu and Zhang, Xin and Xi, Yanxin and Luo, Yan and Xia, Tong and Li, Yong , month = oct, year =. Proceedings of the 32nd. doi:10.1145/3678717.3691242 , language =

work page doi:10.1145/3678717.3691242
[51]

A Neural Representation of Sketch Drawings

A neural representation of sketch drawings , author=. arXiv preprint arXiv:1704.03477 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

International conference on machine learning , pages=

Autoencoding beyond pixels using a learned similarity metric , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016
[53]

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and improving interpolation in autoencoders via an adversarial regularizer , author=. arXiv preprint arXiv:1807.07543 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

ACM Computing Surveys (CSUR) , volume=

Generative adversarial networks (GANs) challenges, solutions, and future directions , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

work page 2021
[55]

International conference on machine learning , pages=

Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[56]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical text-conditional image generation with clip latents , author=. arXiv preprint arXiv:2204.06125 , volume=

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Advances in neural information processing systems , volume=

Photorealistic text-to-image diffusion models with deep language understanding , author=. Advances in neural information processing systems , volume=

work page
[58]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[59]

Adding conditional control to text-to-image diffusion models , booktitle =

Zhang, Lvmin and Rao, Anyi and Agrawala, Maneesh , year =. Adding conditional control to text-to-image diffusion models , booktitle =

work page
[60]

The International conference on computational design and robotic fabrication , pages=

Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model , author=. The International conference on computational design and robotic fabrication , pages=. 2023 , organization=

work page 2023
[61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[62]

arXiv preprint arXiv:2310.02601 , year=

Magicdrive: Street view generation with diverse 3d geometry control , author=. arXiv preprint arXiv:2310.02601 , year=

work page arXiv
[63]

Streetscapes:

Deng, Boyang and Tucker, Richard and Li, Zhengqi and Guibas, Leonidas and Snavely, Noah and Wetzstein, Gordon , month = jul, year =. Streetscapes:. Special. doi:10.1145/3641519.3657513 , language =

work page doi:10.1145/3641519.3657513
[64]

arXiv preprint arXiv:2408.14765 , year=

Crossviewdiff: A cross-view diffusion model for satellite-to-street view synthesis , author=. arXiv preprint arXiv:2408.14765 , year=

work page arXiv
[65]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Sat2scene: 3d urban scene generation from satellite images with diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[66]

Environment and Planning B: Urban Analytics and City Science , author =

Artificial intelligence-aided design:. Environment and Planning B: Urban Analytics and City Science , author =. 2019 , note =

work page 2019
[67]

Computers, Environment and Urban Systems , volume=

Urban visual uniqueness: A landmark-free framework to quantify city's identity and distinctiveness from everyday scenes , author=. Computers, Environment and Urban Systems , volume=. 2025 , publisher=

work page 2025
[68]

Scientific reports , volume=

Mapping facade materials utilizing zero-shot segmentation for applications in urban microclimate research , author=. Scientific reports , volume=. 2025 , publisher=

work page 2025
[69]

International Journal of Environmental Research and Public Health , volume=

Exploring the effects of roadside vegetation on the urban thermal environment using street view images , author=. International Journal of Environmental Research and Public Health , volume=. 2022 , publisher=

work page 2022
[70]

ISPRS International Journal of Geo-Information , volume=

Does the visibility of greenery increase perceived safety in urban areas? Evidence from the place pulse 1.0 dataset , author=. ISPRS International Journal of Geo-Information , volume=. 2015 , publisher=

work page 2015
[71]

Digital Government: Research and Practice , volume=

Image generative ai to design public spaces: a reflection of how ai could improve co-design of public parks , author=. Digital Government: Research and Practice , volume=. 2025 , publisher=

work page 2025
[72]

Urban Informatics , volume=

Towards human-AI collaborative urban science research enabled by pre-trained large language models , author=. Urban Informatics , volume=. 2024 , publisher=

work page 2024
[73]

Artificial Intelligence in Urban Planning and Design , pages=

A new agenda for AI-based urban design and planning , author=. Artificial Intelligence in Urban Planning and Design , pages=. 2022 , publisher=

work page 2022

[1] [1]

2013 , publisher=

The new science of cities , author=. 2013 , publisher=

work page 2013

[2] [2]

2013 , publisher=

Cities for people , author=. 2013 , publisher=

work page 2013

[3] [3]

Landscape and Urban Planning , author =

Landscape value in urban neighborhoods:. Landscape and Urban Planning , author =. 2022 , pages =. doi:10.1016/j.landurbplan.2022.104357 , language =

work page doi:10.1016/j.landurbplan.2022.104357 2022

[4] [4]

Landscape and Urban Planning , author =

Evaluating the subjective perceptions of streetscapes using street-view images , volume =. Landscape and Urban Planning , author =. 2024 , pages =. doi:10.1016/j.landurbplan.2024.105073 , language =

work page doi:10.1016/j.landurbplan.2024.105073 2024

[5] [5]

Computers, Environment and Urban Systems , author =

Quantifying seasonal bias in street view imagery for urban form assessment:. Computers, Environment and Urban Systems , author =. 2025 , note =

work page 2025

[6] [6]

Landscape and urban planning , volume=

Street view imagery in urban analytics and GIS: A review , author=. Landscape and urban planning , volume=. 2021 , publisher=

work page 2021

[7] [7]

Buildings , volume=

Street View Imagery (SVI) in the built environment: A theoretical and systematic review , author=. Buildings , volume=. 2022 , publisher=

work page 2022

[8] [8]

Landscape and Urban Planning , volume=

Can you see green? Assessing the visibility of urban forests in cities , author=. Landscape and Urban Planning , volume=. 2009 , publisher=

work page 2009

[9] [9]

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

What is that building? an end-to-end system for building recognition from streetside images , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

work page

[10] [10]

Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

Deep learning for automatically detecting sidewalk accessibility problems using streetscape imagery , author=. Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

work page

[11] [11]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page

[12] [12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[13] [13]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Layoutdm: Discrete diffusion model for controllable layout generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[14] [14]

2022 , keywords =

Environment and Planning B: Urban Analytics and City Science , author =. 2022 , keywords =. doi:10.1177/23998083211023516 , language =

work page doi:10.1177/23998083211023516 2022

[15] [15]

2022 , keywords =

International Journal of Geographical Information Science , author =. 2022 , keywords =. doi:10.1080/13658816.2022.2041643 , language =

work page doi:10.1080/13658816.2022.2041643 2022

[16] [16]

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

work page

[17] [17]

, author=

Methods in environmental and behavioral research. , author=. 1987 , publisher=

work page 1987

[18] [18]

Journal of Planning Literature , volume=

Virtual reality and urban simulation in planning: A literature review and topical bibliography , author=. Journal of Planning Literature , volume=. 2001 , publisher=

work page 2001

[19] [19]

2001 , publisher=

Visualizing the city: communicating urban design to planners and decision-makers , author=. 2001 , publisher=

work page 2001

[20] [20]

Environment and Planning B: Urban Analytics and City Science , volume=

In search of visualization challenges: The development and implementation of visualization tools for supporting dialogue in urban planning processes , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2017 , publisher=

work page 2017

[21] [21]

Landscape and urban planning , volume=

Guidance for crystal ball gazers: developing a code of ethics for landscape visualization , author=. Landscape and urban planning , volume=. 2001 , publisher=

work page 2001

[22] [22]

2018 , publisher=

Inventing future cities , author=. 2018 , publisher=

work page 2018

[23] [23]

2012 , publisher=

A framework for geodesign: Changing geography by design , author=. 2012 , publisher=

work page 2012

[24] [24]

Journal of Urban Mobility , volume=

Creating visualizations using generative AI to guide decision-making in street designs: A viewpoint , author=. Journal of Urban Mobility , volume=. 2025 , publisher=

work page 2025

[25] [25]

Landscape and Urban Planning , volume=

Generative AI text-to-image for community participation in landscape planning , author=. Landscape and Urban Planning , volume=. 2025 , publisher=

work page 2025

[26] [26]

International journal of environmental research and public health , volume=

How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index , author=. International journal of environmental research and public health , volume=. 2018 , publisher=

work page 2018

[27] [27]

Building and Environment , volume=

Mapping sky, tree, and building view factors of street canyons in a high-density urban environment , author=. Building and Environment , volume=. 2018 , publisher=

work page 2018

[28] [28]

International journal of environmental research and public health , volume=

A systematic measurement of street quality through multi-sourced urban data: A human-oriented analysis , author=. International journal of environmental research and public health , volume=. 2019 , publisher=

work page 2019

[29] [29]

Environment and Planning B: Urban Analytics and City Science , volume=

Examining the spatial distribution and temporal change of the green view index in New York City using Google Street View images and deep learning , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2021 , publisher=

work page 2021

[30] [30]

Landscape and Urban Planning , volume=

Analyzing the effects of Green View Index of neighborhood streets on walking time using Google Street View and deep learning , author=. Landscape and Urban Planning , volume=. 2021 , publisher=

work page 2021

[31] [31]

Sustainable Cities and Society , author =

Accessing eye-level greenness visibility from open-source street view images:. Sustainable Cities and Society , author =. 2024 , pages =. doi:10.1016/j.scs.2024.105262 , language =

work page doi:10.1016/j.scs.2024.105262 2024

[32] [32]

Landscape and Urban Planning , author =

Using. Landscape and Urban Planning , author =. 2019 , pages =. doi:10.1016/j.landurbplan.2018.08.029 , language =

work page doi:10.1016/j.landurbplan.2018.08.029 2019

[33] [33]

Sustainable Cities and Society , author =

Social inequalities in neighborhood visual walkability:. Sustainable Cities and Society , author =. 2019 , pages =. doi:10.1016/j.scs.2019.101605 , language =

work page doi:10.1016/j.scs.2019.101605 2019

[34] [34]

Landscape and Urban Planning , author =

Quantifying the shade provision of street trees in urban landscape:. Landscape and Urban Planning , author =. 2018 , pages =. doi:10.1016/j.landurbplan.2017.08.011 , language =

work page doi:10.1016/j.landurbplan.2017.08.011 2018

[35] [35]

Urban Climate , volume=

Heat vulnerability and street-level outdoor thermal comfort in the city of Houston: Application of Google street view image derived SVFs , author=. Urban Climate , volume=. 2023 , publisher=

work page 2023

[36] [36]

Landscape and Urban Planning , volume=

Quantification through deep learning of sky view factor and greenery on urban streets during hot and cool seasons , author=. Landscape and Urban Planning , volume=. 2023 , publisher=

work page 2023

[37] [37]

Urban Climate , volume=

Sky view factor estimation from street view images based on semantic segmentation , author=. Urban Climate , volume=. 2021 , publisher=

work page 2021

[38] [38]

American Economic Review , volume=

Cities are physical too: Using computer vision to measure the quality and impact of urban appearance , author=. American Economic Review , volume=. 2016 , publisher=

work page 2016

[39] [39]

Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=

Deep learning the city: Quantifying urban perception at a global scale , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=. 2016 , organization=

work page 2016

[40] [40]

Transactions in GIS , volume=

Investigating the association between streetscapes and human walking activities using Google Street View and human trajectory data , author=. Transactions in GIS , volume=. 2018 , publisher=

work page 2018

[41] [41]

Sustainable Cities and Society , author =

Perceiving the fine-scale urban poverty using street view images through a vision-language model , volume =. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106267 , language =

work page doi:10.1016/j.scs.2025.106267 2025

[42] [42]

Sustainable Cities and Society , volume=

Assessing spatiotemporal characteristics of urban heat islands from the perspective of an urban expansion and green infrastructure , author=. Sustainable Cities and Society , volume=. 2021 , publisher=

work page 2021

[43] [43]

Sustainable Cities and Society , author =

Measuring visual walkability perception using panoramic street view images, virtual reality, and deep learning , volume =. Sustainable Cities and Society , author =. 2022 , pages =. doi:10.1016/j.scs.2022.104140 , language =

work page doi:10.1016/j.scs.2022.104140 2022

[44] [44]

Sustainable Cities and Society , author =

An innovative framework combining a. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106384 , language =

work page doi:10.1016/j.scs.2025.106384 2025

[45] [45]

Scientific Reports , volume=

Measuring social, environmental and health inequalities using deep learning and street imagery , author=. Scientific Reports , volume=. 2019 , doi=

work page 2019

[46] [46]

Landscape and Urban Planning , author =

Predicting perceptions of the built environment using. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104257 , language =

work page doi:10.1016/j.landurbplan.2021.104257 2021

[47] [47]

International Journal of Geographical Information Science , author =

Identifying urban villages: an attention-based deep learning approach that integrates remote sensing and street-level images , volume =. International Journal of Geographical Information Science , author =. 2025 , pages =. doi:10.1080/13658816.2024.2442096 , language =

work page doi:10.1080/13658816.2024.2442096 2025

[48] [48]

2013 , doi =

Spatial. 2013 , doi =

work page 2013

[49] [49]

Landscape and Urban Planning , author =

Street view imagery in urban analytics and. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104217 , language =

work page doi:10.1016/j.landurbplan.2021.104217 2021

[50] [50]

Proceedings of the 32nd

Han, Zhenyu and Zhang, Xin and Xi, Yanxin and Luo, Yan and Xia, Tong and Li, Yong , month = oct, year =. Proceedings of the 32nd. doi:10.1145/3678717.3691242 , language =

work page doi:10.1145/3678717.3691242

[51] [51]

A Neural Representation of Sketch Drawings

A neural representation of sketch drawings , author=. arXiv preprint arXiv:1704.03477 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[52] [52]

International conference on machine learning , pages=

Autoencoding beyond pixels using a learned similarity metric , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016

[53] [53]

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and improving interpolation in autoencoders via an adversarial regularizer , author=. arXiv preprint arXiv:1807.07543 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

ACM Computing Surveys (CSUR) , volume=

Generative adversarial networks (GANs) challenges, solutions, and future directions , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

work page 2021

[55] [55]

International conference on machine learning , pages=

Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021

[56] [56]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical text-conditional image generation with clip latents , author=. arXiv preprint arXiv:2204.06125 , volume=

work page internal anchor Pith review Pith/arXiv arXiv

[57] [57]

Advances in neural information processing systems , volume=

Photorealistic text-to-image diffusion models with deep language understanding , author=. Advances in neural information processing systems , volume=

work page

[58] [58]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[59] [59]

Adding conditional control to text-to-image diffusion models , booktitle =

Zhang, Lvmin and Rao, Anyi and Agrawala, Maneesh , year =. Adding conditional control to text-to-image diffusion models , booktitle =

work page

[60] [60]

The International conference on computational design and robotic fabrication , pages=

Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model , author=. The International conference on computational design and robotic fabrication , pages=. 2023 , organization=

work page 2023

[61] [61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[62] [62]

arXiv preprint arXiv:2310.02601 , year=

Magicdrive: Street view generation with diverse 3d geometry control , author=. arXiv preprint arXiv:2310.02601 , year=

work page arXiv

[63] [63]

Streetscapes:

Deng, Boyang and Tucker, Richard and Li, Zhengqi and Guibas, Leonidas and Snavely, Noah and Wetzstein, Gordon , month = jul, year =. Streetscapes:. Special. doi:10.1145/3641519.3657513 , language =

work page doi:10.1145/3641519.3657513

[64] [64]

arXiv preprint arXiv:2408.14765 , year=

Crossviewdiff: A cross-view diffusion model for satellite-to-street view synthesis , author=. arXiv preprint arXiv:2408.14765 , year=

work page arXiv

[65] [65]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Sat2scene: 3d urban scene generation from satellite images with diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[66] [66]

Environment and Planning B: Urban Analytics and City Science , author =

Artificial intelligence-aided design:. Environment and Planning B: Urban Analytics and City Science , author =. 2019 , note =

work page 2019

[67] [67]

Computers, Environment and Urban Systems , volume=

Urban visual uniqueness: A landmark-free framework to quantify city's identity and distinctiveness from everyday scenes , author=. Computers, Environment and Urban Systems , volume=. 2025 , publisher=

work page 2025

[68] [68]

Scientific reports , volume=

Mapping facade materials utilizing zero-shot segmentation for applications in urban microclimate research , author=. Scientific reports , volume=. 2025 , publisher=

work page 2025

[69] [69]

International Journal of Environmental Research and Public Health , volume=

Exploring the effects of roadside vegetation on the urban thermal environment using street view images , author=. International Journal of Environmental Research and Public Health , volume=. 2022 , publisher=

work page 2022

[70] [70]

ISPRS International Journal of Geo-Information , volume=

Does the visibility of greenery increase perceived safety in urban areas? Evidence from the place pulse 1.0 dataset , author=. ISPRS International Journal of Geo-Information , volume=. 2015 , publisher=

work page 2015

[71] [71]

Digital Government: Research and Practice , volume=

Image generative ai to design public spaces: a reflection of how ai could improve co-design of public parks , author=. Digital Government: Research and Practice , volume=. 2025 , publisher=

work page 2025

[72] [72]

Urban Informatics , volume=

Towards human-AI collaborative urban science research enabled by pre-trained large language models , author=. Urban Informatics , volume=. 2024 , publisher=

work page 2024

[73] [73]

Artificial Intelligence in Urban Planning and Design , pages=

A new agenda for AI-based urban design and planning , author=. Artificial Intelligence in Urban Planning and Design , pages=. 2022 , publisher=

work page 2022