Recognition: no theorem link
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3
The pith
Urban-ImageNet supplies over two million social-media images of Chinese cities organized by an urban-theory taxonomy to test AI perception of public spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Urban-ImageNet organizes user-generated images into a ten-class taxonomy grounded in urban studies that distinguishes activated public spaces, consumption areas, accommodation, portraits, and non-spatial social-media content; the resulting benchmark then evaluates representative vision, vision-language, and segmentation models on classification, retrieval, and instance-level tasks, revealing strong supervised performance on scene labels but persistent challenges in cross-modal alignment and object segmentation that lessen only modestly with larger balanced training sets.
What carries the argument
HUSIC taxonomy, a hierarchical ten-class system grounded in urban theory that separates activated versus non-activated public spaces, exterior versus interior environments, and spatial versus non-spatial content to structure evaluation across modalities and scales.
If this is right
- Supervised models reach high accuracy on urban scene classification once trained on the provided 1K to 100K subsets.
- Cross-modal image-text retrieval remains harder than classification, showing limits in current vision-language alignment for urban content.
- Instance segmentation improves with larger training volumes but stays more challenging than whole-scene classification.
- The multi-scale design lets researchers measure exactly how much additional balanced data closes the performance gaps on each task.
Where Pith is reading between the lines
- The same taxonomy and task structure could be applied to social-media imagery from other countries or platforms to test whether the observed performance patterns hold beyond Chinese cities.
- Gaps in retrieval and segmentation suggest that future models may need explicit mechanisms for functional and social context rather than purely visual features.
- If the benchmark succeeds, planners and researchers could use it to train systems that automatically analyze public-space usage from the large volume of online photos already being shared.
Load-bearing premise
The HUSIC taxonomy correctly identifies the spatial, social, and functional distinctions that matter most for how people experience urban spaces and that the Weibo images represent typical city environments without major selection bias.
What would settle it
A sample of images labeled by independent urban experts shows frequent disagreement with the HUSIC classes, or models trained on the dataset achieve no better accuracy on an independent urban image collection than models trained on generic scene datasets.
Figures
read the original abstract
We present Urban-ImageNet, a large-scale multi-modal dataset and evaluation benchmark for urban space perception from user-generated social media imagery. The corpus contains over 2 Million public social media images and paired textual posts collected from Weibo across 61 urban sites in 24 Chinese cities across 2019-2025, with controlled benchmark subsets at 1K, 10K, and 100K scale and a full 2M corpus for large-scale training and evaluation. Urban-ImageNet is organized by HUSIC, a Hierarchical Urban Space Image Classification framework that defines a 10-class taxonomy grounded in urban theory. The taxonomy is designed to distinguish activated and non-activated public spaces, exterior and interior urban environments, accommodation spaces, consumption content, portraits, and non-spatial social-media content. Rather than treating urban imagery as generic scene data, Urban-ImageNet evaluates whether machine perception models can capture spatial, social, and functional distinctions that are central to urban studies. The benchmark supports three tasks within one standardized library: (T1) urban scene semantic classification, (T2) cross-modal image-text retrieval, and (T3) instance segmentation. Our experiments evaluate representative vision, vision-language, and segmentation models, revealing strong performance on supervised scene classification but more challenging behavior in cross-modal retrieval and instance-level urban object segmentation. A multi-scale study further examines how model performance changes as balanced training data increases from 1K, 10K to 100K images. Urban-ImageNet provides a unified, theory-grounded, multi-city benchmark for evaluating how AI systems perceive and interpret contemporary urban spaces across modalities, scales, and task formulations. Dataset and benchmark are available at: huggingface.co/datasets/Yiwei-Ou/Urban-ImageNet and github.com/yiasun/dataset-2.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Urban-ImageNet, a dataset of over 2 million Weibo user-generated images and paired text posts from 61 sites across 24 Chinese cities (2019-2025), organized under the HUSIC hierarchical taxonomy that distinguishes activated/non-activated public spaces, interior/exterior environments, accommodation, consumption, portraits, and non-spatial content. It defines a standardized benchmark with three tasks—T1 urban scene semantic classification, T2 cross-modal image-text retrieval, and T3 instance segmentation—evaluated on controlled subsets (1K/10K/100K) and the full 2M corpus using representative vision, vision-language, and segmentation models. Experiments report differential performance (strong on supervised classification, weaker on retrieval and segmentation) and examine scaling effects with increasing balanced training data. The work positions the resource as a theory-grounded, multi-city, multi-modal benchmark for AI perception of contemporary urban spaces, with public release via Hugging Face and GitHub.
Significance. If the HUSIC taxonomy proves reliable and the Weibo corpus representative, the contribution is a large-scale, publicly available multi-modal benchmark that bridges computer vision and urban studies by focusing on spatial, social, and functional distinctions rather than generic scenes. The multi-task formulation, controlled scaling subsets, and public dataset/code release are strengths that enable reproducible evaluation and interdisciplinary follow-up work. The reported performance gaps across tasks and scales provide initial empirical signals about model limitations in urban contexts.
major comments (3)
- [Abstract and §3] Abstract and §3 (Data Collection): The central claim that Urban-ImageNet supplies a valid benchmark for 'contemporary urban spaces' rests on the HUSIC taxonomy capturing central distinctions, yet the manuscript provides no details on the labeling process, inter-annotator agreement scores, or bias mitigation procedures for the 10-class hierarchy.
- [§3 and §5] §3 (Dataset Curation) and §5 (Experiments): The 2M Weibo corpus and its subsets are asserted to represent typical urban spaces across 24 cities, but no quantitative validation (e.g., comparison to official land-use maps, demographic controls, or multi-source cross-checks) is reported; this leaves the representativeness claim vulnerable to known social-media selection biases toward salient/positive content.
- [§5] §5 (Multi-scale Study): Performance trends are shown as training data grows from 1K to 100K images, but the results lack statistical significance testing, confidence intervals, or ablation against stronger baselines, weakening the interpretation of scaling behavior for the three tasks.
minor comments (2)
- [§2] The HUSIC taxonomy is introduced as 'grounded in urban theory,' but the manuscript would benefit from explicit citations to the specific urban studies references that motivate each of the 10 classes.
- [§5] Figure and table captions for the benchmark results could more clearly indicate which subsets (1K/10K/100K) correspond to each reported metric to improve readability.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we plan to make.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Data Collection): The central claim that Urban-ImageNet supplies a valid benchmark for 'contemporary urban spaces' rests on the HUSIC taxonomy capturing central distinctions, yet the manuscript provides no details on the labeling process, inter-annotator agreement scores, or bias mitigation procedures for the 10-class hierarchy.
Authors: We agree that additional details on the taxonomy construction are necessary to support the benchmark's validity. In the revised manuscript, we will expand §3 to include a description of the labeling process, including the involvement of domain experts in urban studies, the iterative development of the HUSIC hierarchy, inter-annotator agreement scores computed on a sample of annotations, and bias mitigation strategies such as diverse annotator backgrounds and consensus-based labeling. This will be added without altering the core claims. revision: yes
-
Referee: [§3 and §5] §3 (Dataset Curation) and §5 (Experiments): The 2M Weibo corpus and its subsets are asserted to represent typical urban spaces across 24 cities, but no quantitative validation (e.g., comparison to official land-use maps, demographic controls, or multi-source cross-checks) is reported; this leaves the representativeness claim vulnerable to known social-media selection biases toward salient/positive content.
Authors: We acknowledge the limitation regarding quantitative validation of representativeness. The Weibo data inherently carries selection biases as user-generated content. In the revision, we will add a new subsection in §3 discussing these biases explicitly, including any available comparisons (e.g., city-level image distribution vs. population data), and clarify that the dataset serves as a benchmark for social media perceptions of urban spaces rather than a statistically representative sample of all urban environments. We will also include this in the limitations section. revision: partial
-
Referee: [§5] §5 (Multi-scale Study): Performance trends are shown as training data grows from 1K to 100K images, but the results lack statistical significance testing, confidence intervals, or ablation against stronger baselines, weakening the interpretation of scaling behavior for the three tasks.
Authors: We appreciate this suggestion for improving the rigor of our experimental analysis. In the updated §5, we will incorporate statistical significance testing (such as bootstrap confidence intervals and paired statistical tests) for the performance metrics across scales, report 95% confidence intervals, and perform additional ablations using stronger contemporary baselines (e.g., recent CLIP variants or segmentation models). These additions will better substantiate the scaling observations. revision: yes
Circularity Check
No circularity: dataset and benchmark construction is self-contained
full rationale
The paper introduces a new multi-modal dataset from Weibo imagery, defines the HUSIC taxonomy from urban theory literature, and specifies three independent benchmark tasks (semantic classification, cross-modal retrieval, instance segmentation) with standard model evaluations. No equations, fitted parameters, predictions, or derivations are present that could reduce to inputs by construction. The central contribution is data curation and task formulation rather than any self-referential result; external benchmarks and model comparisons are performed on off-the-shelf architectures without load-bearing self-citations or ansatz smuggling.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption HUSIC taxonomy is grounded in urban theory and distinguishes key urban space types
invented entities (1)
-
HUSIC (Hierarchical Urban Space Image Classification) framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Google street view: Capturing the world at street level.Computer, 43(6):32–38, 2010
Dragomir Anguelov, Carole Dulong, Daniel Filip, Christian Frueh, Stéphane Lafon, Richard Lyon, Abhijit Ogale, Luc Vincent, and Josh Weaver. Google street view: Capturing the world at street level.Computer, 43(6):32–38, 2010
work page 2010
-
[2]
Mary Jo Bitner. Servicescapes: The impact of physical surroundings on customers and employ- ees.Journal of Marketing, 56(2):57–71, 1992. doi: 10.1177/002224299205600205
-
[3]
John D. Boy and Justus Uitermark. Reassembling the city through Instagram.Transactions of the Institute of British Geographers, 42(2):612–624, 2017. doi: 10.1111/tran.12185
-
[4]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...
work page 1901
-
[5]
Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: High quality object detection and instance segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5): 1483–1498, 2021. doi: 10.1109/TPAMI.2019.2956516
-
[6]
The Cityscapes dataset for semantic urban scene understanding,
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016. doi: 10.1109/CVPR.2016.350
-
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848
-
[8]
An image is worth 16×16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[9]
Abhimanyu Dubey, Nikhil Naik, Devi Parikh, Ramesh Raskar, and César A. Hidalgo. Deep learning the city: Quantifying urban perception at a global scale. InEuropean Conference on Computer Vision (ECCV), pages 196–212. Springer, 2016. doi: 10.1007/978-3-319-46448-0_ 12
-
[10]
Island Press, Washington, DC, 6th edition, 2011
Jan Gehl.Life Between Buildings: Using Public Space. Island Press, Washington, DC, 6th edition, 2011
work page 2011
-
[11]
Erving Goffman.The Presentation of Self in Everyday Life. Anchor Books, New York, 1959
work page 1959
-
[12]
LVIS: A dataset for large vocabulary instance segmentation
Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5356–5364, 2019. doi: 10.1109/CVPR.2019.00550
-
[13]
Jun He, Yi Lin, Zilong Huang, Jiacong Yin, Junyan Ye, Yuchuan Zhou, Weijia Li, and Xiang Zhang. UrbanFeel: A comprehensive benchmark for temporal and perceptual understanding of city scenes through human perspective.arXiv preprint arXiv:2509.22228, 2025. URL https://arxiv.org/abs/2509.22228
-
[14]
Deep Residual Learning for Image Recognition , isbn =
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90. 10
-
[15]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. InIEEE International Conference on Computer Vision (ICCV), pages 2961–2969, 2017. doi: 10.1109/ ICCV .2017.322
work page 2017
-
[16]
Cambridge University Press, Cambridge, 1984
Bill Hillier and Julienne Hanson.The Social Logic of Space. Cambridge University Press, Cambridge, 1984
work page 1984
-
[17]
Zooming into an Instagram city: Reading the local through social media.First Monday, 18(7), 2013
Nadav Hochman and Lev Manovich. Zooming into an Instagram city: Reading the local through social media.First Monday, 18(7), 2013. doi: 10.5210/fm.v18i7.4711
-
[18]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[19]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. InIEEE International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. doi: 10.1109/ICCV51070.2023.00371
-
[20]
J. Richard Landis and Gary G. Koch. The measurement of observer agreement for categorical data.Biometrics, 33(1):159–174, 1977. doi: 10.2307/2529310
-
[21]
Henri Lefebvre.The Production of Space. Blackwell, Oxford, 1991. Translated by D. Nicholson- Smith
work page 1991
-
[22]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation. InInternational Conference on Machine Learning (ICML), pages 12888–12900, 2022
work page 2022
-
[23]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational Conference on Machine Learning (ICML), 2023
work page 2023
-
[24]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014. doi: 10.1007/978-3-319-10602-1_48
-
[25]
Visual instruction tuning (LLaV A)
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning (LLaV A). InAdvances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[26]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
MIT Press, Cambridge, MA, 1960
Kevin Lynch.The Image of the City. MIT Press, Cambridge, MA, 1960
work page 1960
-
[28]
Selvaraju, Michael Cogswell, Ab- hishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra
Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. InIEEE International Conference on Computer Vision (ICCV), pages 4990–4999, 2017. doi: 10.1109/ICCV .2017.534
-
[29]
Oscar Newman.Defensible Space: Crime Prevention through Urban Design. Macmillan, New York, 1972
work page 1972
-
[30]
Yiwei Ou, Xiaobin Ren, Ronggui Sun, Guansong Gao, Kaiqi Zhao, and Manfredo Manfredini. MMS-VPR: Multimodal street-level visual place recognition dataset and benchmark.arXiv preprint arXiv:2505.12254, 2025
-
[31]
Zoltán Peredy, Sijia Li, and László Vígh. Chinese city tier ranking scheme as special spatial factor of innovations diffusion.International Review, (1-2):88–99, 2024
work page 2024
-
[32]
Ariadna Quattoni and Antonio Torralba. Recognizing indoor scenes. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 413–420, 2009. doi: 10.1109/CVPR. 2009.5206537. 11
-
[33]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInterna- tional Conference on Machine Learning (ICML), pages 8748–8763, 2021
work page 2021
-
[34]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. SAM 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Romain Beaumont, and Jenia Jitsev. LAION-5B: An open large-scale dataset for training next generation image-text models. In...
work page 2022
-
[36]
EfficientNet: Rethinking model scaling for convolutional neural networks
Mingxing Tan and Quoc Le. EfficientNet: Rethinking model scaling for convolutional neural networks. InInternational Conference on Machine Learning (ICML), pages 6105–6114, 2019
work page 2019
-
[37]
Training data-efficient image transformers & distillation through attention (DeiT)
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolves, and Hervé Jégou. Training data-efficient image transformers & distillation through attention (DeiT). InInternational Conference on Machine Learning (ICML), pages 10347–10357, 2021
work page 2021
-
[38]
Whyte.The Social Life of Small Urban Spaces
William H. Whyte.The Social Life of Small Urban Spaces. Conservation Foundation, Washing- ton, DC, 1980
work page 1980
-
[39]
Ehinger, Aude Oliva, and Antonio Torralba
Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3485–3492, 2010. doi: 10.1109/CVPR.2010. 5539970
-
[40]
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014. doi: 10.1162/ tacl_a_00166
work page 2014
-
[41]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2018. doi: 10.1109/TPAMI.2017.2723009
-
[42]
This is a photo of {class_name}
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Semantic understanding of scenes through the ADE20K dataset.International Journal of Computer Vision, 127(3):302–321, 2019. doi: 10.1007/s11263-018-1140-0. 12 A Full Experimental Results A.1 Task 1: Per-Class Diagnostics Figure 7 shows per-class F1 scores fo...
-
[43]
Class-agnostic T3 evaluation.Instance segmentation is benchmarked under a single object category; per-class breakdown would require higher-quality human-annotated ground truth than current pseudo-labels support
-
[44]
Geographic restriction to Chinese cities.All 61 venues are located across 24 Chinese cities; whether theHUSICtaxonomy and learned representations generalise to other cultural or urban contexts requires future geographic expansion
-
[45]
Researchers requiring balanced training at scale should use the 100K tier
Class imbalance in the 2M corpus.The full 2M corpus is class-imbalanced by construction, reflecting real-world social media frequency distributions (non-spatial classes each comprising ≈15–25% of posts; all spatially relevant classes collectively ≈40%). Researchers requiring balanced training at scale should use the 100K tier
-
[46]
Incomplete LLaV A-1.5 100K training.100K fine-tuning of LLaV A-1.5 was not completed due to computational constraints; 1K and 10K results are reported but 100K results are unavailable
-
[47]
T3 SAM oracle circularity.The GT-box SAM oracle (AP = 0.749) partially reflects circularity, as evaluation pseudo-labels were generated by SAM and the oracle uses SAM with perfect box prompts. The Cascade Mask R-CNN and SAM box-refinement results—trained on noisy pseudo-labels and evaluated against stricter-threshold human-audited annotations—provide the ...
-
[48]
Chinese-language social media text.Post-text retrieval operates on original Chinese Weibo posts. Current baselines (CLIP, BLIP, BLIP-2) were pre-trained predominantly on English data, which partly explains the low absolute post-level retrieval scores and motivates future bilingual or multilingual urban-domain pre-training. G The HUSIC 10-Class Framework T...
-
[49]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.