\documentclass[11pt]{article}
% Problem set: do not define TEMPLATE or SOL
% LaTeX template: define TEMPLATE but not SOL
% Solution: define both TEMPLATE and SOL
%
%\def\sol{1}
\def\template{1}
% STUDENTS:
% You can enter your solutions into this template by just
% typing them right after it says "SOLUTION".
% Ignore the \ifdefined\sol blocks; you can delete them if you want.
% (Note that it won't compile if you try to uncomment the
% solution flag :))
%
\usepackage{fullpage,graphicx}
\usepackage{amsmath,amsfonts,amsthm,amssymb,multirow}
\usepackage{algorithmic,comment,url,enumerate}
\usepackage{tikz}
\usepackage{framed}
\usetikzlibrary{decorations.pathreplacing, shapes}
\usepackage[ruled,vlined,commentsnumbered,titlenotnumbered]{algorithm2e}
\newcommand{\expecting}[1]{\noindent{[\textbf{We are expecting:} \em #1\em]}}
\newcommand{\hint}[1]{\noindent{[\textbf{HINT:} \em #1 \em]}}
\newcommand{\pts}[1]{\textbf{(#1 pt.)}}
\newcommand{\sgn}{\mathrm{sgn}}
\definecolor{shadecolor}{gray}{0.95}
\newcommand{\bE}{\mathbb{E}}
\newcommand{\Ex}{\mathbb{E}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\rank}{\mathrm{rank}}
\newcommand{\eps}{\epsilon}
\newtheorem{theorem}{Theorem}
\newtheorem{assumption}{Assumption}
\newtheorem{definition}{Definition}

\begin{document}
\noindent
\textbf{Problem Set 4 \ifdefined\sol Solution \fi} \hfill CS265/CME309, Winter 2026
\ifdefined\sol\else
\newline
Due: Friday 2/6, 11:59pm on Gradescope
\ifdefined\template
\newline 
Group members: INSERT NAMES HERE
\fi

\vspace{.4cm}\noindent
Please follow the homework policies on the course website.
\fi

\noindent
\rule{\linewidth}{0.4pt}

\noindent


\begin{enumerate}

%%%%%%%%% PROCESSES AND CPUs %%%%%%%

\item \pts{8} \textbf{Processes and CPUs}

Suppose that in a distributed system, we have $N$ CPUs and $P$ processes. 
Each process is independently and uniformly allocated to a CPU. However, if multiple processes are allocated to the same CPU, the CPU will choose one of them at random to complete; the remaining processes allocated to that CPU will not be completed.

\begin{enumerate}
    \item \pts{2} 
What is expected number of processes that will be completed?
    \item \pts{6} Suppose that $P \geq N$, and denote the total number of completed processes by $C$. \textbf{Use Poissonization} to prove that $\Pr\left[C \leq \frac{N}{2}\right] \leq e^{-\Omega(N)}.$

[\textbf{Note:} There may be ways to do this problem that don't involve Poissonization, but we want you to use it to get practice with it.  That is, you should prove the statement by analyzing the case when the number of processes is an appropriate Poisson random variable.  Don't forget the de-Poissonization step!]

   \item \pts{0} \textbf{[Optional: this won't be graded.]} Let $\mu$ be your answer from part (a).  Under what conditions on $P$ and $N$ can you use Poissonization to show that $\Pr[ C \leq (1 -\delta)\mu ] \leq e^{-\Omega_\delta(N)}$ for any $\delta > 0$, where the $\Omega_\delta$ notation means that you are allowed to have constants that depend on $\delta$ hidden inside the big-Omega.

    \ifdefined\template
\begin{shaded}
	\textbf{SOLUTION:}
	\ifdefined\sol
	\input{solution/poissonization}
	\fi
\end{shaded}
\fi
\end{enumerate}




%%%%%% BOURGAIN'S EMBEDDING FOR \ell_p %%%%%%%%
\vspace{1cm}
\item \pts{4} \textbf{[Bourgain's embedding for $\ell_p$.]} We showed that Bourgain's embedding allows us to embed an arbitrary metric space $(X, d)$ with $|X| = n$ into $(\R^k, \ell_1)$ with target dimension $k$ being $O((\log n)^2)$ and distortion being $O(\log n)$. Moreover, the embedding can be computed efficiently using a randomized algorithm. Prove that the exact same embedding computed by the randomized algorithm also achieves $O(\log n)$ distortion with high probability when the target metric is $\ell_p$ for $p > 1$.
We encourage you to emphasize only the differences from the proof in the lecture notes rather than copying the entire proof.\\
\hint{Let $f:X\rightarrow \R^k$ denote the relevant embedding. For any two points $x,y\in X$, we showed that $\|f(x) - f(y)\|_1 \leq k\cdot d(x,y)$. Can we say something similar about $\|f(x) - f(y)\|_p$?}\\
\hint{For any two points $a,b\in\R^k$ and $p>1$, it holds that $\|a - b\|_p \geq k^{(1/p) - 1}\|a - b\|_1$. This is a special case of H\"older's inequality.}

\ifdefined\template
\begin{shaded}
\textbf{SOLUTION:}
\ifdefined\sol
\input{solution/Bourgain.tex}
\fi
\end{shaded}
\fi

\newpage
%%%%%%%%% Applications of Fast JL %%%%%%%%%%

\item \textbf{[Fast approximate regression]}
In this problem, we will see an application of (fast\footnote{You can also do a similar application with sparse JL.}) JL to numerical linear algebra; in particular, to approximate regression.

Recall the standard \emph{regression} problem: given $A \in \mathbb{R}^{n \times d}$ and $b \in \mathbb{R}^n$ (with $n \geq d$, so that $A$ has full column rank), the goal is to find 
\[ x^* = \mathrm{argmin}_{x \in \mathbb{R}^d} \|Ax - b\|_2^2.\]
Traditional solvers for this take time about $O(nd^2)$ when $n \gg d$.




In this problem we will find a faster algorithm to find an $\hat{x}$ that is an \emph{approximately} optimal solution, with high probability.



\begin{enumerate}
    \item \pts{6} Consider the following definition:
    \begin{definition}[Subspace Embedding]
A distribution on matrices $\Phi \in \mathbb{R}^{m \times n}$ is a \textbf{$(\epsilon, \delta)$-subspace embedding for subspaces of dimension $k$} if the following holds:

For any $k$-dimensional subspace $U \subseteq \mathbb{R}^n$, with probability at least $1 -\delta$ over the choice of $\Phi$, 
\[ (1 - \epsilon)\|u\|^2_2 \leq \|\Phi u\|^2_2 \leq (1 + \epsilon)\|u\|^2_2 \]
for all $u \in U$.
\end{definition}
    
    Suppose that $\Phi$ is a random matrix in $\mathbb{R}^{m \times n}$ so that:
    \begin{itemize}
        \item The distribution of $\Phi$ is an $(\epsilon=0.01, \delta=0.01)$ subspace embedding for subspaces of dimension~$k$.
        \item $m = O\left(k\right)$.
        \item For any vector $y \in \mathbb{R}^n$, you can do the matrix-vector multiplication\footnote{Note that you don't store $\Phi$ explicitly; you can sample and store a small representation of it very quickly, and then when it's time to do a matrix-vector multiplication, you can use your small representation to do that in time $O( n \log n)$.} $\Phi y$ in time $O(n \log n)$, assuming that $k \ll n^{1/3}$.
    \end{itemize}

    Use $\Phi$ to solve the following problem: Suppose that $d \ll n^{1/3}$.  Design an algorithm that does the following:
    \begin{itemize}
        \item Given $A \in \mathbb{R}^{n \times d}$ and $b \in \mathbb{R}^n$, find $\hat{x} \in \mathbb{R}^d$ so that, with probability at least $0.99$,
        \[ \|A\hat{x} - b\|^2_2 \leq 1.1 \min_{x \in \mathbb{R}^d} \|Ax - b\|^2_2\]
        \item Your algorithm should run in time $O(d^3 + dn \log n) = O(dn \log n)$.  In particular, this is faster than $O(n d^2)$ if $d \gg \log n$.
    \end{itemize}

    Clearly state your algorithm, and prove that it has the desired properties, using the properties of $\Phi$.  You may assume that you can solve $\tilde{n} \times \tilde{k}$ regression problems in time $O( \tilde{n} \tilde{k}^2)$ for any $\tilde{n}, \tilde{k}$.
\newpage
    \item \pts{0} \textbf{[Optional: this part will not be graded.]}  In this optional part, you will come up with the distribution $\Phi$ from part (a).  The starting point is \emph{fast} JL transforms.

    We mentioned in the lectures that there are JL transforms that are faster to multiply by than a dense Gaussian matrix.  Here's a formal statement about such a Fast JL transform.  (Notice that this gives a guarantee about how much $\Phi$ distorts any particular vector $x$.  To get a JL guarantee for all pairs of points $x,y$ in a set $X$, take a union bound on top of this statement).

\begin{theorem}[Implicit in (Ailon, Chazelle, 2009)]\label{thm:AC}
Fix $n > m$.  There is a distribution on matrices $\Phi \in \mathbb{R}^{m \times n}$ so that the following both hold:
\begin{itemize}
\item There is a constant $c$ so that for all $\epsilon >  0$ and any $x \in \mathbb{R}^n$, 
\[ \Pr_{\Phi}[ (1 - \epsilon)\|x\|^2_2 \leq \|\Phi x\|^2_2 \leq (1+\epsilon)\|x\|^2_2 ] \geq 1 - 2\exp(-cm\epsilon^2).\]
\item For $x \in \mathbb{R}^n$, a matrix-vector multiplication of the form $\Phi \cdot x$ can be done in time $O(n \log n + \min\{nm, m^3\}).$
\end{itemize}
\end{theorem}

Show (using appropriate constraints on the values of $n$ and $d$) that the distribution on matrices $\Phi$ in Theorem~\ref{thm:AC} satisfies the requirements on $\Phi$ in part (a).

\hint{You may want to look at the ``Bonus'' material on sparsity that we are skipping this quarter; it's available on the course website.  In particular, try making an $\epsilon$-net of a fixed subspace $U$ and taking a union bound over all points in the net, mimicking what we did to prove that our JL matrix has the RIP.}
\end{enumerate}

\ifdefined\template
\begin{shaded}
\textbf{SOLUTION:}
\ifdefined\sol
\input{solution/fastregression.tex}
\fi
\end{shaded}
\fi


\end{enumerate}
\end{document}
