where the tail of some string x is a string of all but the first
character of x, and x[n] is the nth character of the string x,
starting with character 0.
We can define the Jaccard measure between two strings a, b
as the size of the intersection divided by the size of the
union between the two.
J(a,b) = |a ∩ b| / |a ∪ b|
a.
Given the strings "JP Morgan Chase" and "JPMC Corporation", what is the edit distance between the two?
b.
Given the strings "JP Morgan Chase" and "JPMC Corporation", what is the Jaccard measure between the two?
c.
Given three strings: x = Apple Corporation, CA, y = IBM Corporation, CA, and z =
Apple Corp, which of these strings would be equated by the edit distance methods?
d.
Given three strings: x = Apple Corporation, CA, y = IBM Corporation, CA, and z =
Apple Corp, which of these strings will be equated by the Jaccard measure?
e.
Given three strings: x = Apple Corporation, CA, y = IBM Corporation, CA, and z =
Apple Corp, what intuition you would use to ensure that x is equated to z?