Assignment 4#

Due date: 23:59 04/12/24


Part 1 - Individual Final Project Idea (30 pts)#

  • Please provide a data set you would like to work on for the final project, including the data source, and describe what information this data set includes. (10 pts)


  • Briefly describe your motivation and project idea, including the tasks you would like to perform on your data set, e.g., place name recognition, geocoding. (20 pts)


Part 2 – Reading (20 pts)#

For the second part of the assignment, you will read the paper from Assignment 3 again: An Empirical Study on the Names of Points of Interest and Their Changes with Geographic Distance.

  • Describe why the authors used word2vec in this work, and the advantages of the word2vec approach over the count-based vector approach. (10 pts)


  • Write down the dimensionality of word2vec embeddings generated in this work, and the name of the parameter that corresponds to dimensionality in the Word2Vec() function (used in our lab session). (5 pts)


  • Write down the similarity measure used for measuring the similarity between two embeddings in this work, and the range of similarity results (according to Figure 7). (5 pts)