V-RoAst applies zero-shot VLMs (Gemini-1.5-flash, GPT-4o-mini) to iRAP road safety attribute classification on a new ThaiRAP image dataset and compares them to CNN baselines, finding better generalization to unseen classes but weaker spatial reasoning.
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View Imagery
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
Fine-tuning Gemma 3 27B on modest human-labeled street-view data yields building condition scores that align with and sometimes exceed individual human raters on correlation metrics, with knowledge distillation producing comparable smaller LLM, CNN, and transformer models.
citing papers explorer
-
V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?
V-RoAst applies zero-shot VLMs (Gemini-1.5-flash, GPT-4o-mini) to iRAP road safety attribute classification on a new ThaiRAP image dataset and compares them to CNN baselines, finding better generalization to unseen classes but weaker spatial reasoning.
-
Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery
Fine-tuning Gemma 3 27B on modest human-labeled street-view data yields building condition scores that align with and sometimes exceed individual human raters on correlation metrics, with knowledge distillation producing comparable smaller LLM, CNN, and transformer models.