Place Pulse 2.0 Introduced by Dubey et al. in "Deep Learning the City : Quantifying Urban Perception At A Global Scale"
Place Pulse is a crowdsourcing effort that aims to map which areas of a city are perceived as safer, livelier, wealthier, more active, beautiful and friendly. By asking users to select images from a pair, Place Pulse collected more than 1.5 million reports that evaluate more than 100,000 images from 56 cities.
From Place Pulse 2.0, we extracted only images corresponding to pedestrian roads, resulting in a total of 8,246 images from 40 cities. You can download images from this link.
The goal of this project is to analyze street views of roads, evaluate their aesthetic appeal and cleanliness, and mark areas that need improvement on a map. To achieve this, we built and compared the performance of various models for predicting scores based on images.
- Baseline Model: Predicts scores directly from the raw images.
- Semantic Segmentation-Based Model: Segments objects in images and utilizes this information for prediction.
- Prompt-Based Model: Generates textual descriptions of images as prompts for prediction.
The scores, originally ranging from 1 to 10, were transformed into three classification categories for training a classification model:
- 0 (Dissatisfied): between 1 and 4
- 1 (Neutral): between 4 and 7
- 2 (Satisfied): between 7 and 10
The majority of the scores were distributed between 4 and 6, leading the model to predominantly predict 'neutral (1)'. As a result, the accuracy was high, but the model was not useful in practice.
model | class | accuracy | f1_score |
---|---|---|---|
baseline | beautiful | 0.8170 | 0.7347 |
baseline | clean | 0.8267 | 0.7510 |
segment | beautiful | 0.8024 | 0.7320 |
segment | clean | 0.8200 | 0.7508 |
prompt | beautiful | 0.8170 | 0.7347 |
prompt | clean | 0.8297 | 0.7525 |
model | beautiful | clean |
---|---|---|
baseline | [[0, 260, 0], [0, 1348, 0], [0, 42, 0]] |
[[0, 61, 0], [0, 1364, 5], [0, 220, 0]] |
segment | [[4, 255, 1], [27, 1320, 1], [1, 41, 0]] |
[[0, 61, 0], [1, 1350, 18], [0, 217, 3]] |
prompt | [[0, 260, 0], [0, 1348, 0], [0, 42, 0]] |
[[0, 61, 0], [0, 1369, 0], [0, 220, 0]] |
train | validation | test |
---|---|---|
5772 | 824 | 1650 |
HRNet(Semantic Segmentation): https://github.com/CSAILVision/semantic-segmentation-pytorch
LLaVA(Image → Prompt): https://github.com/camenduru/LLaVA
python baseline.py
python segment.py
python prompt.py