\documentclass{article} \usepackage[pdftex]{graphicx} \usepackage{amsfonts} \usepackage{amsmath, amsthm, amssymb} \usepackage{moreverb} \title{CS 246: Problem Set 1} \author{Tony Hyun Kim} \setlength{\parindent}{0pt} \setlength\parskip{0.1in} \setlength\topmargin{0in} \setlength\headheight{0in} \setlength\headsep{0in} \setlength\textheight{8.2in} \setlength\textwidth{6.5in} \setlength\oddsidemargin{0in} \setlength\evensidemargin{0in} \pdfpagewidth 8.5in \pdfpageheight 11in \begin{document} \maketitle \section{MapReduce} \subsection{Friend Recommender} The source code of my FriendRecommend system can be found in the attachment to the assignment. My algorithm relies on a chain of two Map-Reduce steps where: \begin{itemize} \item \verb=Map1=: For a user $i$ with friends $\left\{f_j\right\}$, emit all unique pairs $(f_j,f_k)$ that are connected through user $i$ in the $(\mathrm{key},\mathrm{value})$ form $((f_j,f_k),i)$. \item \verb=Reduce1=: For each mediated connection (of length $2$) between users $(j,k)$, count the number of shared friends $M_{jk}$. Note: the pair $(j,k)$ is filtered in this step if there is a direct connection between $j$ and $k$ by using a special flag from \verb=Map1=. \item \verb=Map2=: The idea is to re-group the previous output by user ID. For each input $((j,k), M_{jk})$, emit $(j,(M_{jk},k))$ and $(k,(M_{jk},j))$. \item \verb=Reduce2=: For each user $j$, iterate through potential friend candidates $\left\{(M_{jk},k)\right\}$. Here, I implemented a priority queue that keeps track of the top $N=10$ candidates, sorted in decreasing order by $M_{jk}$. \end{itemize} List of 10 pairs of users that are not directly connected but have the most common friends. Note that this information emerges nicely from the intermediate output of the above algorithm. I use a short Matlab script to scan for the largest \verb=NumberOfCommonFriends=. \begin{verbatim} 18739 18740 100 31506 31530 99 31492 31511 96 31511 31556 96 31519 31554 96 31519 31568 96 31533 31559 96 31555 31560 96 31492 31556 95 31503 31537 95 \end{verbatim} \subsection{Finding paths of length $5$ in an undirected graph} The following MapReduce algorithm will perform a join operation on the graph paths provided by two input files here identified by \verb=f1= and \verb=f2=. Beginning with a list of first-degree edges, we will need to perform multiple passes of the following pseudocode to determine all paths of length $L=5$. I use Matlab-based indexing. \begin{verbatimtab}[4] Map( n // A list of nodes in a single path (each line of input files). f // File identifier is 'f1' if path comes from file 1; 'f2' if from file 2. ) emit(n[1], (f,n[2:end])) // Path in original order emit(n[end],(f,reverse(n[1:(end-1)]))) // Path in reverse order Reduce( key // Index of first node in path vs // (File identifier, rest of path) ) for i = 1:(vs.len-1) for j = (i+1):vs.len if(vs[i][1] != vs[j][1]) // Join segments only from different files jpath = [reverse(vs[j][2]) key vs[i][2]] if(!duplicates(jpath)) // Filter out duplicate nodes in 'jpath' (i.e. loops) write(jpath[1],jpath[2:end]) // Write 'jpath' to output file \end{verbatimtab} \subsection{Paths of length $L$} Let the files \verb=f1= and \verb=f2= contain paths of lengths $l_1$ and $l_2$ respectively. A single MapReduce step (as outlined above) produces an output that contains the paths of length $l_1+l_2$. This fact allows us to determine the minimum number of MapReduce jobs required to compute all paths of length $L$. We begin with a file \verb=f1= that contains paths of length $1$. In the first iteration of the MapReduce algorithm, we feed two copies of \verb=f1=, and obtain a file \verb=f2= that contains paths of length $2$. In the subsequent MapReduce step, we may choose to join \verb=f1= and \verb=f2=, or \verb=f2= and \verb=f1= yielding edges with lengths $3$ or $4$ respectively. In general, subsequent steps can join any pair of intermediate paths that have been computed up to that point. If the desired edge length is a power of $2$, it is clear that a minimum of $m = \log_2(L)$ steps are required. If $L$ is not a power of two, the following code snippet computes the required number of steps $m$: \begin{verbatimtab} MinimumSteps(L) m = floor(log2(L)) L = L - 2^m while(L!=0) k = floor(log2(L)) L = L - 2^k m = m + 1 return m \end{verbatimtab} For example, it takes $m=3$ MapReduce steps to compute $L=5$. \subsection{Matrix multiplication $AA^T$} Let the matrix $A$ be divided into blocks with $n$ blocks along the rows and $m$ blocks along the column. Denote by $A_{I,J}$ the $(I,J)$-th sub-block. We wish to compute \begin{equation} C = A\cdot A^T \quad\Longleftrightarrow\quad C_{I,K} = \sum_J A_{I,J}\cdot (A_{K,J})^T, \end{equation} where $(A_{K,J})^T$ indicates the $(K,J)$-th sub-block of the original matrix $A$, which is followed by a transpose. We will compute $C = A\cdot A^T$ by the following three MapReduce steps. Note that the first step simply parses the matrix $A$ into sub-blocks and may be omitted if the input file is properly formatted along the block structure. The following two steps are conceptually similar to the two-step matrix multiplication algorithm discussed in the course textbook in Section~2.3.10. \begin{verbatimtab}[4] Map1( i,j,aij // An element in the original matrix A n,m // Block counts ) [I J i' j'] = GlobalToBlockIndex(i,j,n,m) // Block index I,J; Sub-block index i',j' emit((I,J),(i',j',aij)) Reduce1(key,vs) emit(key,vs) // Identity function Map2( I,J,AIJ // Block index I,J and the corresponding block matrix ) for k = 1:n emit((I,J,k),(`A',AIJ)) AIJT = AIJ' // Take the transpose of AIJ. for k = 1:n emit((k,J,I),(`At',AIJT)) Reduce2(key,(A AT)) // Order of A and A' may be reversed. Order appropriately emit(key,A*AT) // Sub-block multiplication possible in a single node Map3( (I,J,K),(AAT) ) emit((I,K),AAT) Reduce3(key,vs) emit(key,sum(vs)) // Result is the (I,K)-th block of C = A*A' \end{verbatimtab} \subsection{Matrix multiplication tradeoffs} First, we note that given $n$ blocks along the rows and $m$ blocks along the columns, our block indices $(I,J)$ ranges: $I=1,2,\cdots,n$ and $J=1,2,\cdots,m$. With this in mind, we can compute the number of mappers and reducers in the three steps as a function of $n$ and $m$. \begin{itemize} \item \verb=Map1=: Not applicable. One mapper per nonzero entry of $A$ assuming standard matrix formatting. \item \verb=Red1=: $nm$ reducers. One for each matrix sub-block. \item \verb=Map2=: $nm$ reducers. One for each matrix sub-block. \item \verb=Red2=: $n^2m$ reducers. Each sub-block $A_{I,J}$ is multiplied $n$ times, and there are $nm$ sub-blocks in all. \item \verb=Map3=: Same as \verb=Red2=; $n^2m$ mappers. \item \verb=Red3=: $nm$ reducers. One for each sub-block of output. \end{itemize} Each element of the matrix $A$ is copied $n$ times over the network. This occurs in the second MapReduce step, where the matrix sub-block is duplicated $n$ times to separate nodes. With $nm$ sub-blocks, each block contains $N^2/nm$ entries, where $N$ is the size of the original matrix $A$. Since my block multiplication algorithm never requires the local storage of more than two sub-blocks (except for \verb=Red3= which must perform a single-pass accumulation over the partial multiplication results), the local memory requirement scales as $1/nm$. \section{Association rules} \subsection{A-priori algorithm} I have implemented two Java programs that generate association rules for item sets of size $2$ and $3$, namely: \begin{itemize} \item \verb=FrequentItemsets=: is a three-pass MapReduce program that generates the supports for frequent singletons, pairs and triplets. \item \verb=AssociationRules=: takes the output of \verb=FrequentItemsets= (\emph{i.e.} the supports for singletons, pairs and triplets) and generates all possible association rules and their confidences. \end{itemize} A list of top five rules in decreasing order of confidence for item sets of size $2$ and $3$: \begin{verbatim} wilt --> thou (conf = 1.0000) shalt --> thou (0.9992) mayest --> thou (0.9907) didst --> thou (0.9906) art --> thou (0.9867) hast,thine --> thou (1.0000) shalt,thine --> thou (1.0000) go,shalt --> thou (1.0000) out,shalt --> thou (1.0000) do,shalt --> thou (1.0000) \end{verbatim} The confidence is dominated by association rules that involve \verb=thou=. This is clearly an artifact of the fact that the token \verb=thou= occurs with high support (independent of other words). \subsection{Alternate evaluations of association rules} \subsubsection{Lift measure} List the top $5$ rules in decreasing order of the lift measure for item sets of size $2$ and $3$. Note that the lift measure is symmetric in $A$ and $B$, \emph{i.e.} $\mathrm{lift}(A\rightarrow B)=\mathrm{lift}(B\rightarrow A)$. In the following list, I present $10$ distinct $A,B$ pairs. Each entry should be interpreted as both $A\rightarrow B$ and $B\rightarrow A$. \begin{verbatim} book <--> written (lift = 67.6397) about <--> round (50.9435) gold <--> silver (50.1028) burnt <--> offerings (47.4101) cut <--> off (45.4186) spake <--> moses,saying (40.4831) christ <--> jesus,lord (35.3443) christ <--> god,jesus (34.1711) thus <--> israel,saith (30.4210) jesus <--> christ,lord (29.8515) \end{verbatim} \subsubsection{Conviction measure} An obvious disadvantage of the lift measure is that it is symmetric in $A$ and $B$ which precludes any causal interpretation between $A$ and $B$. On the other hand, the conviction measure defined as \begin{equation} \mathrm{conv}(A\rightarrow B) = \frac{1-S(B)}{1-\mathrm{conf}(A\rightarrow B)} = \frac{1-S(B)}{1-\frac{S(A\cup B)}{S(A)}} \end{equation} is clearly asymmetric in $A$ and $B$. Additionally, the conviction measure penalizes the rule $A\rightarrow B$ if the support for $B$ is independently high [through the numerator $1-S(B)$]. Hence the conviction measure is not arbitrarily inflated if the set $B$ simply enjoys large support in the given dataset as was the case for the confidence measure. \subsection{Relationship between rules} \subsubsection{$i_1\rightarrow i_3$} This potential rule certainly satisfies the support requirement, since $\left\{i_1,i_3\right\}$ is a subset of $\left\{i_1,i_2,i_3\right\}$. However, it \textbf{may or may not} have the threshold confidence. \subsubsection{$i_1,i_2,i_4\rightarrow i_3$} This \textbf{may or may not} be an association rule. There is no guarantee that the itemset $\{i_1,i_2,i_3,i_4\}$ has the necessary support. \subsubsection{$i_3\rightarrow i_5,i_6$} This \textbf{may or may not} be an association rule. While $\{i_3,i_5,i_6\}$ has the necessary support (it is a subset of $\{i_3,i_4,i_5,i_6\}$), it is not guaranteed that the proposed rule has sufficient confidence. \subsubsection{$i_3,i_4 \rightarrow i_5$} Yes, this proposed rule \textbf{must} be an association rule. The itemset ${i_3,i_4,i_5}$ occurs with sufficient support since it is a subset of ${i_3,i_4,i_5,i_6}$. Furthermore, $\mathrm{conf}(i_3,i_4\rightarrow i_5) \geq \mathrm{conf}(i_3,i_4\rightarrow i_5,i_6)$ so the confidence threshold is also satisfied. \subsubsection{$i_2,i_5,i_7 \rightarrow i_6$} This \textbf{may or may not} be an association rule depending on why $i_2,i_5\rightarrow i_6,i_7$ is not an association rule. If the latter fails because of insufficient support, then the proposed rule will also fail. However, if the latter only has insufficient confidence, then it is possible (though not guaranteed) that $i_2,i_5,i_7 \rightarrow i_6$ will exceed the threshold confidence. \section{Locality sensitive hashing} \subsection{Necessary condition} We wish to show that for a similarity function $\mathrm{sim}(x,y)$ to possess a locality-sensitive hashing scheme, the function $d(x,y) = 1-\mathrm{sim}(x,y)$ must satisfy the triangle inequality: \begin{equation} d(x,y) + d(y,z) \geq d(x,z) \label{eq:triangle} \end{equation} for all $x$, $y$ and $z$. We begin with a probabilistic interpretation of the function $d(x,y)$: \begin{eqnarray} d(x,y) &=& 1-\mathrm{sim}(x,y)\nonumber\\ &=& 1-\mathbb{P}\left[h(x)=h(y)\right]\nonumber\\ &=& \mathbb{P}\left[h(x)\neq h(y)\right]. \end{eqnarray} We thus see that Eq.~\ref{eq:triangle} is equivalent to the following statements: \begin{eqnarray} \mathbb{P}[h(x)\neq h(y)] + \mathbb{P}[h(y)\neq h(z)] &\geq& \mathbb{P}[h(x)\neq h(z)]\\ \mathbb{P}(A) + \mathbb{P}(B) &\geq& \mathbb{P}(C)\label{eq:inequality} \end{eqnarray} where we have defined the events $A$, $B$ and $C$ as: \begin{equation} A = [h(x)\neq h(y)],\qquad B = [h(y)\neq h(z)],\qquad C = [h(x)\neq h(z)]. \end{equation} Let us enumerate the total probability using the events $A$, $B$ and $C$: \begin{center} \begin{tabular}{ccc|l} A & B & C & $\mathbb{P}$\\ \hline $0$ & $0$ & $0$ & $p_0 = \mathbb{P}(\bar{A}\cap\bar{B}\cap\bar{C})$\\ $0$ & $0$ & $1$ & $p_1 = \mathbb{P}(\bar{A}\cap\bar{B}\cap{C})=0$\\ $0$ & $1$ & $0$ & $p_2 = \mathbb{P}(\bar{A}\cap{B}\cap\bar{C})=0$\\ $0$ & $1$ & $1$ & $p_3 = \mathbb{P}(\bar{A}\cap{B}\cap{C})$\\ $1$ & $0$ & $0$ & $p_4 = \mathbb{P}({A}\cap\bar{B}\cap\bar{C})=0$\\ $1$ & $0$ & $1$ & $p_5 = \mathbb{P}({A}\cap\bar{B}\cap{C})$\\ $1$ & $1$ & $0$ & $p_6 = \mathbb{P}({A}\cap{B}\cap\bar{C})$\\ $1$ & $1$ & $1$ & $p_7 = \mathbb{P}({A}\cap{B}\cap{C})$\\ \end{tabular} \end{center} Note that the probabilities $p_1$, $p_2$ and $p_4$ are identically zero, since their associated event (the appropriate intersection of $A$, $B$ and $C$) is the null event. For instance, the event $\bar{A}\cap\bar{B}\cap{C}$ requires simultaneously $h(x)=h(y)$, $h(y)\neq h(z)$ and $h(x)=h(z)$ which is clearly impossible. From the table, we then find: \begin{eqnarray*} \mathbb{P}(A) &=& p_5+p_6+p_7\\ \mathbb{P}(B) &=& p_3+p_6+p_7\\ \mathbb{P}(C) &=& p_3+p_5+p_7 \end{eqnarray*} from which the desired inequality of Eq.~\ref{eq:inequality} immediately follows. \subsection{No locality-sensitive hashing scheme for Overlap similiarity} Let $A=\{x\}$, $B=\{x,y\}$ and $C=\{y\}$. It then follows that: \begin{eqnarray*} \mathrm{sim}_\mathrm{Over}(A,B) &=& 1 \quad\rightarrow\quad d(A,B) = 0\\ \mathrm{sim}_\mathrm{Over}(B,C) &=& 1 \quad\rightarrow\quad d(B,C) = 0\\ \mathrm{sim}_\mathrm{Over}(A,C) &=& 0 \quad\rightarrow\quad d(A,C) = 1 \end{eqnarray*} Hence the overlap similiarity metric does not satisfy the triangle inequality $d(A,B)+d(B,C)\geq d(A,C)$ and hence cannot possess a locality-sensitive hashing scheme. \subsection{No locality-sensitive hashing scheme for Dice similiarity} Again consider $A=\{x\}$, $B=\{x,y\}$ and $C=\{y\}$. We have \begin{eqnarray*} \mathrm{sim}_\mathrm{Dice}(A,B) &=& 2/3 \quad\rightarrow\quad d(A,B) = 1/3\\ \mathrm{sim}_\mathrm{Dice}(B,C) &=& 2/3 \quad\rightarrow\quad d(B,C) = 1/3\\ \mathrm{sim}_\mathrm{Dice}(A,C) &=& 0/1 \quad\rightarrow\quad d(A,C) = 1 \end{eqnarray*} which again fails the triangle inequality for $d$. \section{LSH for approximate near neighbor search} The ``trick'' to this problem is simply to be careful with notation, and to recall the algebraic properties of the logarithm function. The basic facts are as follows: \begin{enumerate} \item The dataset $\mathcal{A}$ is a set of $n$ points in a metric space with distance measure $d$. \begin{enumerate} \item Define $z\in\mathcal{A}$ to be a specified ``query point.'' \item Assuming there exists a point $x\in\mathcal{A}$ such that $d(x,z)\leq\lambda$, our goal is to retrieve a point $x'\in\mathcal{A}$ with $d(x',z)\leq c\lambda$. \end{enumerate} \item The family of hash functions $\mathcal{H}$ is $(\lambda,c\lambda,p_1,p_2)$-sensitive. \item The family $\mathcal{G}$, formed as a $k$-way \verb=AND= of the elements of $\mathcal{H}$, is then $(\lambda,c\lambda,p_1^k,p_2^k)$-sensitive. \begin{enumerate} \item With $k=\log_{1/p_2}n$, we may simplify $p_2^k$ as follows: $p_2^k = (1/p_2)^{-\log_{1/p_2}n}=1/n$. \end{enumerate} \item Finally, we take $L=n^\rho$ random members $g_1,g_2,\cdots,g_L$ of $\mathcal{G}$ where $\rho=\frac{\log1/p_1}{\log1/p2}$ (or equivalently $1/p_1 = (1/p_2)^\rho$), and hash all the points of $\mathcal{A}$ using all $g_i$'s. \end{enumerate} \subsection{An upper-bound on false positives\label{subsec:falsepos}} Let \begin{itemize} \item $W_j = \{x\in\mathcal{A}\;|\;g_j(x)=g_j(z)\}$, \emph{i.e.} the subset of $\mathcal{A}$ that map under $g_j$ to the same bucket as $g_j(z)$. \item $T = \{x\in\mathcal{A}\;|\;d(x,z)>c\lambda\}$, \emph{i.e.} the set of points that do not match our search criterion. Ideally, the elements of $T$ should not share any hash bucket with $z$. \end{itemize} We wish to show that \begin{equation} \mathbb{P}\left[\sum_{j=1}^L|T\cap W_j|>3L\right] < \frac{1}{3} \label{eq:falsepositives} \end{equation} \emph{i.e.} the probability that we have more than $3L$ false positives in our LSH scheme (the elements of $T$ that did in fact map to one of the buckets $g_i(z)$) is at most $1/3$. Consider the random variable \begin{equation} \sum_{j=1}^L|T\cap W_j| = L\cdot|T\cap W_1| \label{eq:gsymmetry} \end{equation} where we have used the symmetry of the problem with respect to the $g_j$'s. Now, let us examine the random variable $|T\cap W_1|$. Let $x\in{T}$ be fixed. Since $\mathcal{G}$ is $(\lambda,c\lambda,p_1^k,1/n)$-sensitive and $d(x,z)>c\lambda$, the probability that $x\in W_1$ is at most $1/n$. It then follows that $|T\cap W_1|$ is a binomial distribution with a mean that is bounded as: $\mathbb{E}(|T\cap W_1|) = np < n\cdot1/n = 1$. Using this result, Eq.~\ref{eq:gsymmetry} yields that $\mathbb{E}(\sum_{j=1}^L|T\cap W_j|) < L$. We then apply the Markov inequality to Eq.~\ref{eq:falsepositives} \begin{equation} \mathbb{P}\left[\sum_{j=1}^L|T\cap W_j|>3L\right] < \mathbb{P}\left[\sum_{j=1}^L|T\cap W_j|>3\cdot\mathbb{E}\left(\sum_{j=1}^L|T\cap W_j|\right)\right] < \frac{1}{3} \tag*{$\square$} \end{equation} \subsection{An upper-bound on false negatives\label{subsec:falseneg}} Let $x^*\in\mathcal{A}$ be a point such that $d(x^*,z)\leq\lambda$. We wish to show that \begin{equation} \mathbb{P}\left[g_j(x^*)\neq g_j(z)\;(\forall 1\leq j\leq L)\right] < \frac{1}{e} \label{eq:falsenegatives} \end{equation} Since $\mathcal{G}$ is $(\lambda,c\lambda,p_1^k,1/n)$-sensitive and $d(x^*,z)\leq\lambda$, the probability that $g_j(x^*)=g_j(z)$ for any particular $j$ is at least $p_1^k$. Equivalently, the probability that $g_j(x^*)\neq g_j(z)$ for any particular $j$ is at most $1-p_1^k$; and it follows that the probability that $g_j(x^*)\neq g_j(z)$ for all $j$ is at most $\left(1-p_1^k\right)^L$. Using our previous definitions of $k=\log_{1/p2}n$ and $\rho=\frac{\log(1/p_1)}{\log(1/p_2)}$ and $L=n^\rho$, we have \begin{eqnarray*} \mathbb{P}\left[g_j(x^*)\neq g_j(z)\;(\forall 1\leq j\leq L)\right] &\leq& \left(1-p_1^k\right)^L = \left(1-\left(\frac{1}{p_1}\right)^{-\log_{1/p_2}n}\right)^L\\ &\leq& \left(1-\left(\frac{1}{p_2}\right)^{-\rho\log_{1/p_2}n}\right)^L = \left(1-\left(\frac{1}{p_2}\right)^{\log_{1/p_2}(1/L)}\right)^L\\ &\leq& \left(1-\frac{1}{L}\right)^L < \frac{1}{e}. \end{eqnarray*} In the last line, we have used the limit definition of the exponential function (\emph{i.e.} $e^x = \lim_{m\rightarrow\infty}\left(1+\frac{x}{m}\right)^m$) and the fact that the expansion for finite $m\geq1$ is a lower bound for the series limit.\hfill$\square$ \subsection{$(c,\lambda)$-ANN has contant probability of success} In the $(c,\lambda)$-ANN algorithm, we retrieve at most $3L$ data points from the buckets $g_j(z)$, ($1\leq j \leq L$) and report the closest one as a $(c,\lambda)$-ANN. Let us estimate then the probability of error. Note that we assume that there is at least one point $x^*\in\mathcal{A}$ with $d(x^*,z)\leq\lambda$. Clearly, if there are more points that satisfy the distance criterion, the probability of success will be higher. The algorithm can fail in two ways: \begin{itemize} \item The element $x^*$ is hashed to an incorrect bucket, \emph{i.e.} $g_j(x^*)\neq g_j(z)$ for all $j$. This event was considered previously in Section~\ref{subsec:falseneg}. The probability of this ``false negative'' event is bounded at $e^{-1}$. \item The element $x^*$ is hashed to a correct bucket, but too many false positives (more than $3L$) from the set $\sum_{j=1}^L|T\cap W_j|$ leads $x^*$ not to be considered in our first $3L$ candidates. This ``false positives'' scenario was considered in Section~\ref{subsec:falsepos}. \end{itemize} The corresponding probability of failure is \begin{equation*} \mathbb{P}_\mathrm{fail} < e^{-1}+\left(1-e^{-1}\right)\cdot\frac{1}{3} \approx 0.58, \end{equation*} and hence the reported point is an actual $(c,\lambda)$-ANN with constant probability.\hfill$\square$ \subsection{$(c,\lambda)$-ANN on a dataset of images} \subsubsection{Running time} On my laptop, the average search time for LSH is $16$~ms, while the linear (exhaustive) search takes $216$~ms. \subsubsection{Error as a function of $L$ and $k$\label{subsubsec:errorLk}} Fig.~\ref{fig:relerror} shows the relative error of LSH compared to the linear search as a function of $L$ and $k$. In the best case, the error should approach $1$. \begin{figure}[t] \begin{center} \includegraphics[width=0.9\textwidth]{p4d_results.pdf} \end{center} \caption{The relative error between LSH and exhaustive search results as a function of $L$ and $k$. We compare the distances of $3$ nearest neighbors under LSH and linear search. A relative error of $1$ is ideal.\label{fig:relerror}} \end{figure} \subsubsection{Top $10$ nearest neighbors using the two methods} In Fig.~\ref{fig:viscomp}, I show the top $5$ (sorry, top $10$ doesn't fit into the page as nicely...) LSH and linear search results for three different image patches. I used $L=10$, $k=16$ since that parameter set yielded the best performance as shown in Section~\ref{subsubsec:errorLk}. In most cases, the visual agreement between the two methods are quite good. (In fact the quantitative distance measures are also comparable.) \begin{figure}[p] \begin{center} \includegraphics[height=0.9\textheight]{combined.pdf} \end{center} \caption{Visual comparison of top $5$ nearest neighbors according to LSH and linear (exhaustive search). Three query images (indices of $900$, $600$ and $200$) are shown.\label{fig:viscomp}} \end{figure} \end{document}