Assignment 2#

Due date: 23:59 06/11/24


Part 1 - Georeferencing (60 pts)#

For the first part of the assignment, you will perform geoparsing and geocoding tasks on Seattle Airbnb Open Data. You will be working on the reviews dataset, including unique id for each reviewer and detailed comments. The .csv file you downloaded from moodle is already pre-processed (English reviews of the top 5 most reviewed listings).

1. Read in reviews_selected_en.csv as a pandas dataframe (5 pts)#

import pandas as pd

2. Load the spaCy English trained pipeline (5 pts)#

import spacy

3. Extract geo entities from the comments column and save the results to a new column geo_entities (20 pts)#

Consider spaCy categories LOC, ORG, and GPE as geo_entities here
Hint: to avoid duplicates, you can use set instead of list to store identified geo_entities

def extract_geo_entities(text):
    
    return ...

df['geo_entities'] = df['comments'].apply(extract_geo_entities)

4. Explore the identified geo_entites and pick 5 places of interest in Seattle (5 pts)#

five_poi = []

5. Geocode the 5 selected places and plot them on map (25 pts)#

from geopy.geocoders import ...
import folium

Part 2 - Reading (40 pts)#

For the second part of the assignment, you will read this paper on How Do People Describe Locations During a Natural Disaster: An Analysis of Tweets from Hurricane Harvey.

  1. Please write a brief summary (no more than 250 words) on why geoparsing is needed for the study, and list similar previous works mentioned in the paper. (20 pts)


  1. What are challenges needed to be addressed in order to develop a toponym resolution model for disaster-related tweets? (20 pts)


Submission#

The assignment includes two parts. Please just submit as one Jupyter Notebook file.