pith. sign in

arxiv: 2605.17527 · v1 · pith:GRDTHSS7new · submitted 2026-05-17 · 💻 cs.CV

Designing streetscapes from street-view imagery using diffusion models

Pith reviewed 2026-05-20 13:39 UTC · model grok-4.3

classification 💻 cs.CV
keywords street-view imagerydiffusion modelsurban planninggenerative AImultimodal datasetstreetscape generationsemantic consistencyvisual controls
0
0 comments X

The pith

Diffusion models synthesize realistic alternative streetscapes conditioned on visual metrics and text from street-view data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first builds a multimodal dataset that pairs street-view images from Chicago and Orlando with textual descriptions, segmentation maps, road masks, and numerical metrics for elements such as greenery or building coverage. It then trains diffusion models on this data to generate new streetscape images that respond to both written prompts and visual controls like target metrics. The work aims to move urban analysis beyond measuring existing conditions toward creating and exploring hypothetical designs. If the models hold up, planners gain a direct visual tool for testing alternative urban scenarios while retaining control over key quantitative indicators.

Core claim

We construct a multimodal dataset aligning street-view imagery with textual descriptions, segmentation maps, road masks, and quantitative visual metrics in Chicago and Orlando. Using this dataset, diffusion models produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Incorporating visual controls improves semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. Overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago as measured by mIoU, with class-wise gains exceeding 100% for some building view indices. When textual and visual controls conflict, the 1

What carries the argument

Generative multimodal AI framework using diffusion models conditioned on textual descriptions, segmentation maps, road masks, and quantitative visual metrics to synthesize alternative streetscapes.

If this is right

  • Visual controls improve semantic consistency while preserving global realism in the generated images.
  • Semantic consistency rises substantially, by 23.7 percent in Orlando and 46.4 percent in Chicago according to mIoU scores.
  • Class-specific improvements exceed 100 percent for certain categories such as building view indices.
  • Imagery controls dominate textual controls during conflicts, revealing a control hierarchy.
  • Streetscape generation can be directed in fine-grained ways by combining or prioritizing text and visual prompts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be applied to additional cities to create location-specific urban design simulators.
  • Generated scenes might be fed into traffic or environmental models to forecast effects of proposed changes on mobility or heat.
  • The observed dominance of visual controls points toward developing stronger metric-based conditioning for planning use cases.
  • Future tests could examine whether the same models support sequential generation showing how streetscapes evolve over time or under different climate conditions.

Load-bearing premise

The multimodal dataset accurately aligns street-view images with textual descriptions, segmentation maps, road masks, and quantitative visual metrics without significant alignment errors or biases.

What would settle it

Compare generated images against held-out real street views to check whether outputs match the input visual metrics, for example by measuring if a high green-view-index condition produces measurably higher greenery scores in the resulting images.

Figures

Figures reproduced from arXiv: 2605.17527 by Chang Zhao, Kailai Sun, Lingqian Hu, Qingqi Song, Shenhao Wang, Yuebing Liang, Yuzhou Chen.

Figure 1
Figure 1. Figure 1: A data point from our multimodal dataset, including a textual description, a road mask, a street [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ControlNet architecture As shown by [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Streetscape generation with Stable Diffusion and ControlNet [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evaluating the consistency between textual prompts and generated imagery. Histograms: Density [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Increasing the proportion of green and sky view indices without changing the proportion of others [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Increasing the proportion of tree view index while reducing the proportions of sky and buildings [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Generating street-view image with road mask and context-aware controls. (a) Context specified [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Streetscape generation by contrasting imagery and textual controls. The rows show generated [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Street-view imagery (SVI) is widely used to quantify key indicators of urban environment, such as green- ery, sky, or road view indices. However, existing studies largely focus on measuring current streetscapes and rarely support the generation of alternative and non-existing urban scenarios, which is a core task in geospatial disciplines such as urban planning and design. To address this gap, we propose a gener- ative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics, enabling direct visual exploration of urban scenarios. We first construct a multimodal dataset that aligns SVIs with textual descriptions, segmentation maps, road masks, and quantitative metrics of visual elements in Chicago and Orlando. Using this dataset, we demonstrate that diffusion models can produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Our quantitative evaluations show that incorporating visual controls can improve semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. In addition, overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago, as measured by the mIoU index, with class-wise gains exceeding even 100% improvement for building view indices. Streetscape generation can be controlled in a fine-grained manner by both visual and textual prompts, and when textual and visual controls conflict, imagery controls consistently dominate, indicating a clear control hierarchy and the importance of further developing visual controls for urban scene generation. Overall, this work establishes an important benchmark for streetscape generation us- ing SVIs and diffusion models, and illustrates how generative AI can serve as a practical, scalable, and controllable approach for urban scenario exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs a multimodal dataset aligning street-view images (SVIs) from Chicago and Orlando with textual descriptions, segmentation maps, road masks, and quantitative visual metrics. It then applies diffusion models to generate alternative streetscapes conditioned on these inputs, reporting that visual controls improve semantic consistency (LPIPS reduced by ~6%, mIoU increased by 23.7% in Orlando and 46.4% in Chicago, with some class-wise gains >100%), that imagery controls dominate text in conflicts, and that this provides a benchmark for controllable urban scenario generation.

Significance. If the dataset alignments and results hold, the work offers a practical benchmark for applying diffusion models to geospatial urban planning tasks, enabling visual exploration of non-existing streetscapes via multimodal conditioning and highlighting a control hierarchy favoring imagery over text.

major comments (2)
  1. [Dataset Construction] The multimodal dataset construction (described in the abstract and Methods) is load-bearing for all claims, yet provides no details on alignment procedures between SVIs, texts, segmentations, road masks, and metrics, nor on error rates, validation (e.g., manual review or consistency checks), or potential biases from automated tools. Without this, the reported LPIPS reduction and mIoU gains could arise from data properties rather than conditioning strength.
  2. [Quantitative Evaluations] §4 (Quantitative Evaluations): while overall mIoU gains and class-wise improvements (e.g., building view indices) are reported, the manuscript lacks error analysis, failure cases, or statistical significance testing for the ~6% LPIPS and 23.7–46.4% mIoU improvements, making it difficult to assess robustness of the semantic consistency claims.
minor comments (2)
  1. [Abstract] The abstract notes that 'imagery controls consistently dominate' but does not specify the exact protocol for testing conflicting prompts; adding this detail would clarify the control hierarchy result.
  2. [Methods] Training details for the diffusion models (e.g., architecture, conditioning implementation, hyperparameters) are referenced but not fully elaborated, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to improve the clarity and robustness of our work.

read point-by-point responses
  1. Referee: [Dataset Construction] The multimodal dataset construction (described in the abstract and Methods) is load-bearing for all claims, yet provides no details on alignment procedures between SVIs, texts, segmentations, road masks, and metrics, nor on error rates, validation (e.g., manual review or consistency checks), or potential biases from automated tools. Without this, the reported LPIPS reduction and mIoU gains could arise from data properties rather than conditioning strength.

    Authors: We agree that more details on the dataset construction are necessary to fully support our claims. In the revised manuscript, we will add a dedicated subsection in the Methods describing the alignment procedures in detail. This will include the sources of the SVIs, how textual descriptions were generated and aligned, the models used for segmentation maps and road masks, and the formulas for computing the quantitative visual metrics. Additionally, we will report validation procedures such as manual review of a random sample of 500 alignments, estimated alignment error rates, and a discussion of potential biases from the automated tools (e.g., segmentation model inaccuracies in urban scenes). These additions will help demonstrate that the observed improvements stem from the multimodal conditioning rather than underlying data characteristics. revision: yes

  2. Referee: [Quantitative Evaluations] §4 (Quantitative Evaluations): while overall mIoU gains and class-wise improvements (e.g., building view indices) are reported, the manuscript lacks error analysis, failure cases, or statistical significance testing for the ~6% LPIPS and 23.7–46.4% mIoU improvements, making it difficult to assess robustness of the semantic consistency claims.

    Authors: We acknowledge the need for more rigorous quantitative analysis. In the revised version, we will enhance §4 by including error analysis through reporting standard deviations or confidence intervals for the LPIPS and mIoU metrics based on multiple evaluation runs. We will also add a subsection on failure cases, providing qualitative examples of generated images where semantic consistency is not achieved and discussing potential reasons. Furthermore, we will conduct and report statistical significance tests, such as Wilcoxon signed-rank tests or t-tests, to evaluate the significance of the reported improvements. These changes will allow readers to better assess the robustness of our semantic consistency claims. revision: yes

Circularity Check

0 steps flagged

No circularity: standard diffusion conditioning on newly constructed multimodal dataset

full rationale

The paper's derivation consists of constructing a multimodal dataset aligning SVIs with texts, segmentations, masks and metrics, followed by training and evaluating standard diffusion models under textual and visual controls. Quantitative results (LPIPS reduction, mIoU gains) are obtained from empirical comparisons on held-out generations rather than any fitted parameter being renamed as a prediction or any self-referential definition. No equations, uniqueness theorems, or ansatzes are shown to reduce to prior self-citations or inputs by construction; the central claim therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard diffusion model properties for multimodal conditioning and the assumption that the custom dataset provides reliable alignment between images and control signals; no new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Diffusion models can be effectively conditioned on multimodal inputs (text and imagery) to produce semantically consistent outputs for urban scenes.
    Invoked in the demonstration that models respond to both textual and imagery controls while maintaining realism.

pith-pipeline@v0.9.0 · 5847 in / 1267 out tokens · 60266 ms · 2026-05-20T13:39:32.421334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 3 internal anchors

  1. [1]

    2013 , publisher=

    The new science of cities , author=. 2013 , publisher=

  2. [2]

    2013 , publisher=

    Cities for people , author=. 2013 , publisher=

  3. [3]

    Landscape and Urban Planning , author =

    Landscape value in urban neighborhoods:. Landscape and Urban Planning , author =. 2022 , pages =. doi:10.1016/j.landurbplan.2022.104357 , language =

  4. [4]

    Landscape and Urban Planning , author =

    Evaluating the subjective perceptions of streetscapes using street-view images , volume =. Landscape and Urban Planning , author =. 2024 , pages =. doi:10.1016/j.landurbplan.2024.105073 , language =

  5. [5]

    Computers, Environment and Urban Systems , author =

    Quantifying seasonal bias in street view imagery for urban form assessment:. Computers, Environment and Urban Systems , author =. 2025 , note =

  6. [6]

    Landscape and urban planning , volume=

    Street view imagery in urban analytics and GIS: A review , author=. Landscape and urban planning , volume=. 2021 , publisher=

  7. [7]

    Buildings , volume=

    Street View Imagery (SVI) in the built environment: A theoretical and systematic review , author=. Buildings , volume=. 2022 , publisher=

  8. [8]

    Landscape and Urban Planning , volume=

    Can you see green? Assessing the visibility of urban forests in cities , author=. Landscape and Urban Planning , volume=. 2009 , publisher=

  9. [9]

    Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

    What is that building? an end-to-end system for building recognition from streetside images , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

  10. [10]

    Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

    Deep learning for automatically detecting sidewalk accessibility problems using streetscape imagery , author=. Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility , pages=

  11. [11]

    Advances in neural information processing systems , volume=

    Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

  12. [12]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  13. [13]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Layoutdm: Discrete diffusion model for controllable layout generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  14. [14]

    2022 , keywords =

    Environment and Planning B: Urban Analytics and City Science , author =. 2022 , keywords =. doi:10.1177/23998083211023516 , language =

  15. [15]

    2022 , keywords =

    International Journal of Geographical Information Science , author =. 2022 , keywords =. doi:10.1080/13658816.2022.2041643 , language =

  16. [16]

    Advances in neural information processing systems , volume=

    Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

  17. [17]

    , author=

    Methods in environmental and behavioral research. , author=. 1987 , publisher=

  18. [18]

    Journal of Planning Literature , volume=

    Virtual reality and urban simulation in planning: A literature review and topical bibliography , author=. Journal of Planning Literature , volume=. 2001 , publisher=

  19. [19]

    2001 , publisher=

    Visualizing the city: communicating urban design to planners and decision-makers , author=. 2001 , publisher=

  20. [20]

    Environment and Planning B: Urban Analytics and City Science , volume=

    In search of visualization challenges: The development and implementation of visualization tools for supporting dialogue in urban planning processes , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2017 , publisher=

  21. [21]

    Landscape and urban planning , volume=

    Guidance for crystal ball gazers: developing a code of ethics for landscape visualization , author=. Landscape and urban planning , volume=. 2001 , publisher=

  22. [22]

    2018 , publisher=

    Inventing future cities , author=. 2018 , publisher=

  23. [23]

    2012 , publisher=

    A framework for geodesign: Changing geography by design , author=. 2012 , publisher=

  24. [24]

    Journal of Urban Mobility , volume=

    Creating visualizations using generative AI to guide decision-making in street designs: A viewpoint , author=. Journal of Urban Mobility , volume=. 2025 , publisher=

  25. [25]

    Landscape and Urban Planning , volume=

    Generative AI text-to-image for community participation in landscape planning , author=. Landscape and Urban Planning , volume=. 2025 , publisher=

  26. [26]

    International journal of environmental research and public health , volume=

    How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index , author=. International journal of environmental research and public health , volume=. 2018 , publisher=

  27. [27]

    Building and Environment , volume=

    Mapping sky, tree, and building view factors of street canyons in a high-density urban environment , author=. Building and Environment , volume=. 2018 , publisher=

  28. [28]

    International journal of environmental research and public health , volume=

    A systematic measurement of street quality through multi-sourced urban data: A human-oriented analysis , author=. International journal of environmental research and public health , volume=. 2019 , publisher=

  29. [29]

    Environment and Planning B: Urban Analytics and City Science , volume=

    Examining the spatial distribution and temporal change of the green view index in New York City using Google Street View images and deep learning , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2021 , publisher=

  30. [30]

    Landscape and Urban Planning , volume=

    Analyzing the effects of Green View Index of neighborhood streets on walking time using Google Street View and deep learning , author=. Landscape and Urban Planning , volume=. 2021 , publisher=

  31. [31]

    Sustainable Cities and Society , author =

    Accessing eye-level greenness visibility from open-source street view images:. Sustainable Cities and Society , author =. 2024 , pages =. doi:10.1016/j.scs.2024.105262 , language =

  32. [32]

    Landscape and Urban Planning , author =

    Using. Landscape and Urban Planning , author =. 2019 , pages =. doi:10.1016/j.landurbplan.2018.08.029 , language =

  33. [33]

    Sustainable Cities and Society , author =

    Social inequalities in neighborhood visual walkability:. Sustainable Cities and Society , author =. 2019 , pages =. doi:10.1016/j.scs.2019.101605 , language =

  34. [34]

    Landscape and Urban Planning , author =

    Quantifying the shade provision of street trees in urban landscape:. Landscape and Urban Planning , author =. 2018 , pages =. doi:10.1016/j.landurbplan.2017.08.011 , language =

  35. [35]

    Urban Climate , volume=

    Heat vulnerability and street-level outdoor thermal comfort in the city of Houston: Application of Google street view image derived SVFs , author=. Urban Climate , volume=. 2023 , publisher=

  36. [36]

    Landscape and Urban Planning , volume=

    Quantification through deep learning of sky view factor and greenery on urban streets during hot and cool seasons , author=. Landscape and Urban Planning , volume=. 2023 , publisher=

  37. [37]

    Urban Climate , volume=

    Sky view factor estimation from street view images based on semantic segmentation , author=. Urban Climate , volume=. 2021 , publisher=

  38. [38]

    American Economic Review , volume=

    Cities are physical too: Using computer vision to measure the quality and impact of urban appearance , author=. American Economic Review , volume=. 2016 , publisher=

  39. [39]

    Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=

    Deep learning the city: Quantifying urban perception at a global scale , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=. 2016 , organization=

  40. [40]

    Transactions in GIS , volume=

    Investigating the association between streetscapes and human walking activities using Google Street View and human trajectory data , author=. Transactions in GIS , volume=. 2018 , publisher=

  41. [41]

    Sustainable Cities and Society , author =

    Perceiving the fine-scale urban poverty using street view images through a vision-language model , volume =. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106267 , language =

  42. [42]

    Sustainable Cities and Society , volume=

    Assessing spatiotemporal characteristics of urban heat islands from the perspective of an urban expansion and green infrastructure , author=. Sustainable Cities and Society , volume=. 2021 , publisher=

  43. [43]

    Sustainable Cities and Society , author =

    Measuring visual walkability perception using panoramic street view images, virtual reality, and deep learning , volume =. Sustainable Cities and Society , author =. 2022 , pages =. doi:10.1016/j.scs.2022.104140 , language =

  44. [44]

    Sustainable Cities and Society , author =

    An innovative framework combining a. Sustainable Cities and Society , author =. 2025 , pages =. doi:10.1016/j.scs.2025.106384 , language =

  45. [45]

    Scientific Reports , volume=

    Measuring social, environmental and health inequalities using deep learning and street imagery , author=. Scientific Reports , volume=. 2019 , doi=

  46. [46]

    Landscape and Urban Planning , author =

    Predicting perceptions of the built environment using. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104257 , language =

  47. [47]

    International Journal of Geographical Information Science , author =

    Identifying urban villages: an attention-based deep learning approach that integrates remote sensing and street-level images , volume =. International Journal of Geographical Information Science , author =. 2025 , pages =. doi:10.1080/13658816.2024.2442096 , language =

  48. [48]

    2013 , doi =

    Spatial. 2013 , doi =

  49. [49]

    Landscape and Urban Planning , author =

    Street view imagery in urban analytics and. Landscape and Urban Planning , author =. 2021 , pages =. doi:10.1016/j.landurbplan.2021.104217 , language =

  50. [50]

    Proceedings of the 32nd

    Han, Zhenyu and Zhang, Xin and Xi, Yanxin and Luo, Yan and Xia, Tong and Li, Yong , month = oct, year =. Proceedings of the 32nd. doi:10.1145/3678717.3691242 , language =

  51. [51]

    A Neural Representation of Sketch Drawings

    A neural representation of sketch drawings , author=. arXiv preprint arXiv:1704.03477 , year=

  52. [52]

    International conference on machine learning , pages=

    Autoencoding beyond pixels using a learned similarity metric , author=. International conference on machine learning , pages=. 2016 , organization=

  53. [53]

    Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

    Understanding and improving interpolation in autoencoders via an adversarial regularizer , author=. arXiv preprint arXiv:1807.07543 , year=

  54. [54]

    ACM Computing Surveys (CSUR) , volume=

    Generative adversarial networks (GANs) challenges, solutions, and future directions , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

  55. [55]

    International conference on machine learning , pages=

    Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=

  56. [56]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Hierarchical text-conditional image generation with clip latents , author=. arXiv preprint arXiv:2204.06125 , volume=

  57. [57]

    Advances in neural information processing systems , volume=

    Photorealistic text-to-image diffusion models with deep language understanding , author=. Advances in neural information processing systems , volume=

  58. [58]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  59. [59]

    Adding conditional control to text-to-image diffusion models , booktitle =

    Zhang, Lvmin and Rao, Anyi and Agrawala, Maneesh , year =. Adding conditional control to text-to-image diffusion models , booktitle =

  60. [60]

    The International conference on computational design and robotic fabrication , pages=

    Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model , author=. The International conference on computational design and robotic fabrication , pages=. 2023 , organization=

  61. [61]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  62. [62]

    arXiv preprint arXiv:2310.02601 , year=

    Magicdrive: Street view generation with diverse 3d geometry control , author=. arXiv preprint arXiv:2310.02601 , year=

  63. [63]

    Streetscapes:

    Deng, Boyang and Tucker, Richard and Li, Zhengqi and Guibas, Leonidas and Snavely, Noah and Wetzstein, Gordon , month = jul, year =. Streetscapes:. Special. doi:10.1145/3641519.3657513 , language =

  64. [64]

    arXiv preprint arXiv:2408.14765 , year=

    Crossviewdiff: A cross-view diffusion model for satellite-to-street view synthesis , author=. arXiv preprint arXiv:2408.14765 , year=

  65. [65]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Sat2scene: 3d urban scene generation from satellite images with diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  66. [66]

    Environment and Planning B: Urban Analytics and City Science , author =

    Artificial intelligence-aided design:. Environment and Planning B: Urban Analytics and City Science , author =. 2019 , note =

  67. [67]

    Computers, Environment and Urban Systems , volume=

    Urban visual uniqueness: A landmark-free framework to quantify city's identity and distinctiveness from everyday scenes , author=. Computers, Environment and Urban Systems , volume=. 2025 , publisher=

  68. [68]

    Scientific reports , volume=

    Mapping facade materials utilizing zero-shot segmentation for applications in urban microclimate research , author=. Scientific reports , volume=. 2025 , publisher=

  69. [69]

    International Journal of Environmental Research and Public Health , volume=

    Exploring the effects of roadside vegetation on the urban thermal environment using street view images , author=. International Journal of Environmental Research and Public Health , volume=. 2022 , publisher=

  70. [70]

    ISPRS International Journal of Geo-Information , volume=

    Does the visibility of greenery increase perceived safety in urban areas? Evidence from the place pulse 1.0 dataset , author=. ISPRS International Journal of Geo-Information , volume=. 2015 , publisher=

  71. [71]

    Digital Government: Research and Practice , volume=

    Image generative ai to design public spaces: a reflection of how ai could improve co-design of public parks , author=. Digital Government: Research and Practice , volume=. 2025 , publisher=

  72. [72]

    Urban Informatics , volume=

    Towards human-AI collaborative urban science research enabled by pre-trained large language models , author=. Urban Informatics , volume=. 2024 , publisher=

  73. [73]

    Artificial Intelligence in Urban Planning and Design , pages=

    A new agenda for AI-based urban design and planning , author=. Artificial Intelligence in Urban Planning and Design , pages=. 2022 , publisher=