Knowledge Graphs

Exercise 4.3 - Cosine Similarity

To check if two names might refer to the same real-world organization, one strategy is to check the similarity between two documents that describe them. Cosine similarity is a measure of similarity between two vectors, and is defined as the cosine of the angle between them. Highly similar documents will have a cosine score closer to 1. Which of the following might be a viable approach to convert a document into a vector for calculating cosine similarity?

a. Word embeddings of the words used in a document.
b. TF/IDF scores of the words used in a document.
c. TF/IDF of bigrams used in a document.
d. None of the above.
e. Any of (a), (b) or (c).