\documentclass[12pt]{exam}
\usepackage[utf8]{inputenc} % For UTF8 source encoding.
\usepackage{amsmath} % For displaying math equations.
\usepackage{amsfonts} % For mathematical fonts (like \mathbb{E}!).
\usepackage{upgreek} % For upright Greek letters, such as \upvarphi.
\usepackage{wasysym} % For additional glyphs (like \smiley!).
% For document margins.
\usepackage[left=.8in, right=.8in, top=1in, bottom=1in]{geometry}
\usepackage{lastpage} % For a reference to the number of pages.
% TODO: Enter your name here :)
\newcommand*{\authorname}{[Your name goes here]}
\newcommand*{\psetnumber}{0}
\newcommand*{\psetdescription}{Concept Refresher}
\newcommand*{\duedate}{Tuesday, April 9th}
\newcommand*{\duetime}{2:30 pm}
% Fancy headers and footers
\headrule
\firstpageheader{CS166\\Spring 2019}{Problem Set \psetnumber\\\psetdescription}{Due: \duedate\\at \duetime}
\runningheader{CS166}{Problem Set \psetnumber}{\authorname}
\footer{}{\footnotesize{Page \thepage\ of \pageref{LastPage}}}{}
% Exam questions.
\newcommand{\Q}[1]{\question{\large{\textbf{#1}}}}
\qformat{} % Remove formatting from exam questions.
% Useful macro commands.
\newcommand*{\ex}{\mathbb{E}}
\newcommand*{\bigtheta}[1]{\Theta\left( #1 \right)}
\newcommand*{\bigo}[1]{O \left( #1 \right)}
\newcommand*{\bigomega}[1]{\Omega \left( #1 \right)}
\newcommand*{\prob}[1]{\text{Pr} \left[ #1 \right]}
\newcommand*{\var}[1]{\text{Var} \left[ #1 \right]}
% Custom formatting for problem parts.
\renewcommand{\thepartno}{\roman{partno}}
\renewcommand{\partlabel}{\thepartno.}
% Framed answers.
\newcommand{\answerbox}[1]{
\begin{boxed}
\hspace{\fill}
\vspace{#1}
\end{boxed}}
% MZ
\usepackage{amsthm}
\usepackage{amssymb}
\let\oldemptyset\emptyset
\renewcommand{\emptyset}{\text{\O}}
\renewcommand\qedsymbol{$\blacksquare$}
\newenvironment{prf}{{\bfseries Proof.}}{\qedsymbol}
\newcommand{\bi}[1]{\textit{\textbf{#1}}}
\newcommand{\annotate}[1]{\textit{\textcolor{blue}{#1}}}
\usepackage{stmaryrd}
\makeatletter
\@namedef{ver@framed.sty}{9999/12/31}
\@namedef{opt@framed.sty}{}
\makeatother
\usepackage{minted}
\usepackage{mathtools}
\usepackage{alltt}
\printanswers
\setlength\answerlinelength{2in} \setlength\answerskip{0.3in}
\begin{document}
\title{CS166 Problem Set \psetnumber: \psetdescription}
\author{\authorname}
\date{}
\maketitle
\thispagestyle{headandfoot}
\begin{center}
\section*{Section One: Mathematical Prerequisites}
\end{center}
\begin{questions}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Q{Problem One: Fibonacci Fun! (3 Points)}
The Fibonacci numbers are a famous sequence defined as
\[
F_0 = 0 \qquad F_1 = 1 \qquad F_{n+2} = F_n + F_{n+1}
\]
For example, the first few terms of the Fibonacci sequence are
\[
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, \ldots
\]
There's a close connection between the Fibonacci numbers and the quantity $\upvarphi=\frac{1+\sqrt{5}}{2}$, the \bi{golden ratio}. In case you're wondering where that number comes from, the golden ratio is the positive root of the quadratic equation $x^2 = 1 + x$.
We'd like you to prove some results about the Fibonacci numbers. In what follows, please do not use any properties of Fibonacci numbers other than what's given in the definition above. The purpose of this problem is to make sure you're comfortable reasoning about terms from first principles.
\begin{parts}
\part Using the formal definition of big-O notation, prove that $F_n = \bigo{\upvarphi^n}$. To do so, find explicit choices of the constants $c$ and $n_0$ for the definition of big-O notation, then use induction to prove that those choices are correct.
Our proofwriting style expectations are along the lines of what you'd see in CS161. Write in complete sentences rather than bullet points, use mathematical notation when appropriate but not as a stand-in for plain English, etc. Remember that an actual person is going to be reading your proof, so be nice to them by writing a lucid, clear argument that respects the intelligence of the reader but doesn't ask them to do the heavy lifting for you. \smiley
\begin{solution}
Your solution goes here!
\end{solution}
\pagebreak
\part Along the lines of part (i) of this problem, using the formal definition of big-$\Omega$ notation, prove that $F_n = \bigomega{\upvarphi^n}$.
\begin{solution}
Your solution goes here!
\end{solution}
\end{parts}
You've just proved that $F_n = \Theta(\upvarphi) = \Theta(\upvarphi^n)$, which is not immediately obvious! Fibonacci numbers show up in lots of algorithms and data structures, and what you've just proved will definitely make an appearance later this quarter.
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Q{Problem Two: Probability and Concentration Inequalities (4 Points)}
The analysis of randomized data structures sometimes involves working with sums of random variables. Our goal will often be to get a tight bound on those sums, usually to show that some runtime is likely to be low or that some estimate is likely to be good. If we only know two pieces of information about those random variables (what their expected values are and that they're nonnegative), we can get some information about how their sums behave.
\begin{parts}
\part Let $X_1, X_2, \ldots, X_n$ be a collection of $n$ nonnegative random variables such that $\ex\left[X_i\right] = 1$ for each variable $X_i$. (Note that these random variables might not be independent of one another.) Prove that $\prob{\sum_{i=1}^n X_i \geq 2n} \leq \frac{1}{2}$. You may want to use Markov's inequality.
\begin{solution}
Your solution goes here!
\end{solution}
\uplevel{Sometimes you'll find that the sort of bound you get from an analysis like part (i) isn't strong enough to prove what you need to prove. In those cases, you might want to start looking more at the spread of each individual random variable. If, for example, you know the variances of those variables are small, you might be able to get a tighter bound.}
\part Let $X_1, X_2, \dots, X_n$ be a collection of $n$ nonnegative random variables. As in part (i), you know that $\ex[X_i] = 1$ for each variable $X_i$. But now suppose you know two other facts. First, you know that each variable has unit variance (the fancy way of saying $\var{X_i} = 1$ for each variable $X_i$). Second, while you don't know for certain whether these variables are independent of one another, you know that they're \bi{pairwise uncorrelated}. That is, you know that $X_i$ and $X_j$ are uncorrelated random variables for any $i \neq j$. Under these assumptions, prove that $\prob{\sum_{i=1}^n X_i \geq 2 n} \leq \frac{1}{n}$. You may want to use Chebyshev's inequality.
\begin{solution}
Your solution goes here!
\end{solution}
\uplevel{The analysis in part (ii) only works if the variables are pairwise uncorrelated.}
\part Pick a natural number $n > 0$ and define a collection of random variables $X_1, X_2, \ldots, X_n$ such that
\begin{itemize}
\item each $X_i$ is nonnegative,
\item $\ex[X_i] = 1$ for each variable $X_i$,
\item $\var{X_i} = 1$ for each variable $X_i$, but
\item $\prob{\sum_{i=1}^n X_i \geq 2n} > \frac{1}{n}$.
\end{itemize}
Once you've done this, go back to your proof from part (ii) and make sure you can point out the specific spot where the math breaks down once you remove the requirement that the Xi's be pair- wise uncorrelated.
\begin{solution}
Your solution goes here!
\end{solution}
\uplevel{As you saw in this problem, learning more about the distribution of random variables makes it easier to provide tighter bounds on their sums, and correlations across those variables makes it harder. This is a good intuition to have for later in the quarter, where we'll be discussing how different assumptions on hash functions lead to different analyses of data structures.}
\end{parts}
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{center}
\section*{Section Two: Algorithmic Prerequisites}
\end{center}
\Q{Problem Three: Binary Search Trees (4 Points)}
This one is all coding, so you don't need to write anything here. Make sure to submit your final implementation on myth.
\vspace{.2 in}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Q{Problem Four: Event Planning (4 Points)}
You're trying to figure out what Fun and Exciting Things you'd like to do over the weekend. You download a list of all the local events going on in your area. Each event is tagged with its location, which you can imagine is a point in the 2D plane. (We'll pretend that the world is flat, at least in a small neighborhood around your location. Thanks, multivariable calculus.) You also have your own $(x, y)$ location.
Design a algorithm that, given some number $k$, returns a list of the $k$ events that are closest to you, sorted by increasing order of distance. Your algorithm should run in time $\bigo{n + k \log k}$, where $n$ is the number of nearby events. Then prove your algorithm is correct and meets the required time bounds.
Some specific details and edge cases to watch for:
\begin{itemize}
\item You can assume, for simplicity, that no two events are at the same distance from you.
\item By ``distance'', we mean Euclidean distance. We're already assuming the world is flat, so while we're at it seems pretty reasonable to also ignore things like roads and speed limits. \smiley
\end{itemize}
As a hint, think about the algorithms you studied in CS161 and see if any of them would make for good subroutines.
To make things easier for the grader, we recommend doing the following when writing up your solution:
\begin{enumerate}
\item Start off by giving a quick, two-sentence, high-level description of your approach. This makes it easier for the grader to contextualize what it is that you're trying to do.
\item Next, go into more detail. Describe how your algorithm works, one step at a time. Please don't write actual code unless it's exceptionally well-commented and serves a purpose that plain English couldn't. (Trust us -- from experience, reading code is often much harder than reading prose!)
\item Write a quick correctness proof. Tell us what, specifically, you're going to prove, then go prove it. Our proof expectations are similar to those for CS161 -- write in complete sentences, use mathematical notation when appropriate but not as a stand-in for plain English, etc.
\item Write a runtime analysis. Go at whatever level of detail seems most appropriate.
\end{enumerate}
A note on this problem, and other problems going forward: when measuring runtime in the context of algorithms and data structures, it's important to distinguish between \bi{deterministic} and \bi{randomized} algorithms. There's a lot of research into how to take \emph{randomized} algorithms with a nice \emph{expected} runtime and convert them into \emph{deterministic} algorithms with a nice \emph{worst-case} runtime. Since this problem set is designed as a warm-up, we'll accept either a deterministic algorithm with a worst-case runtime of $O(n + k \log k)$ or a randomized algorithm with an expected runtime of $\bigo{n + k \log k}$, though in the future we'll tend to be a bit stricter about avoiding randomness.
\begin{solution}
Your solution goes here!
\end{solution}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{questions}
\end{document}