\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{listings}
\usepackage{enumitem}
\usepackage{hyperref}
\usepackage{algorithmicx,algorithm}
\usepackage{algpseudocode}
\usepackage{float}
\usepackage{etoolbox}
\usepackage{url}
\usepackage[most]{tcolorbox}
\usepackage{xcolor}
\usepackage{minted}
% 0.0in to center
\setlength{\topmargin}{-.53in}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{0.0in}
\setlength{\evensidemargin}{0.0in}
\setlength{\textheight}{9in}
%\setlength{\parskip}{\medskipamount}
\setlength{\marginparwidth}{1.2in}
%\renewcommand{\baselinestretch}{1.3}
\lstset{
language=Python,
showstringspaces=false,
formfeed=newpage,
tabsize=4,
commentstyle=itshape,
morekeywords={models, lambda, forms}
}
\newtheorem{theorem} {Theorem}
\newtheorem{lemma} [theorem] {Lemma}
\newtheorem{remark} [theorem] {Remark}
\newtheorem{fact} [theorem] {Fact}
\newtheorem{claim} [theorem] {Claim}
\newtheorem{corollary} [theorem] {Corollary}
\newtheorem{definition} [theorem] {Definition}
\newtheorem{prop} [theorem] {Proposition}
\newtheorem{warning} [theorem] {Warning}
\def\proof{\rm \trivlist \item[\hskip \labelsep{\bf Proof. }]}
\def\endproof{\hfill $\rule{1.3ex}{1.3ex}$ \endtrivlist}
\newcommand{\ip}[2]{
{\langle {#1} , {#2} \rangle}
}
\newcommand{\n}[1]{
{\| {#1} \|}
}
\newcommand{\ns}[1]{
{\| {#1} \|^2}
}
\providetoggle{solution}
\settoggle{solution}{false}
\def\prop{{\em Proposition: }}
\def\L{\mathcal{L}}
\def\N{\mathcal{N}}
\def\C{\mathcal{C}}
\def\Z{\mathcal{Z}}
\def\Q{\mathcal{Q}}
\def\R{\mathcal{R}}
\def\P{\mathcal{P}}
\def\M{\mathcal{M}}
\def\B{\mathcal{B}}
\def\TV{$T \in \mathcal{L}(V)$ }
\def\pf{\noindent {\bf Proof Sketch: }}
%\def\limsup{\overline{\lim}}
%\def\liminf{\underline{\lim}}
\def\sm{\setminus}
\newcommand{\bfx}{{\bf x}}
\newcommand{\bfy}{{\bf y}}
\begin{document}
\noindent
\mbox{CS168, Spring
2019~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
\vspace{.2in}
\begin{center}
{\LARGE Mini-Project \#8}
\end{center}
\begin{center}
{\large Due by 11:59 PM on \textbf{Wednesday}, May 29th.}
\\
\\
{\large FirstName LastName, FirstName LastName}
\end{center}
\section*{Instructions}
\begin{itemize}
\item You can work individually or with one partner. If you work in a pair,
both partners will receive the same grade.
\item Detailed submission instructions can be found on the course website
(\url{http://cs168.stanford.edu}) in the ``Coursework - Assignment'' section.
If you work in pairs, \textbf{only one member} should submit
all of the relevant files.
\item Use 12pt or higher font for your writeup, and please you the solution template posted on the course webpage.
\item Make sure the plots you submit are easy to read at a normal zoom level.
\item If you've written code to solve a certain part of a problem, or if the part explicitly asks you to implement
an algorithm, you must also include the code in your pdf submission.
\item Code marked as Deliverable should be pasted into the relevant section. Keep variable names consistent
with those used in the problem statement, and with general conventions. No need to include import
statements and other scaffolding, if it is clear from context. Use the \verb|verbatim| environment to paste
code in LaTeX.
\begin{verbatim}
def example():
print "Your code should be formatted like this."
\end{verbatim}
\item {\bf Reminder:} No late assignments will be accepted, but we will drop your
lowest assignment grade when calculating your final grade.
% \item \textbf{Please include all code at the end of the Gradescope submission, i.e. after all your answers to all the questions. You do not need to mark the pages having the code when you mark pages containing answers to questions on Gradescope.}
\end{itemize}
\medskip
\medskip
\begin{enumerate}
\item {\bf Fourier Transforms.} Recall that the discrete Fourier transform of a length $N$ vector/signal ${\bf f}$ is a length $N$ complex-valued vector/signal ${\bf F}$ whose $m$th coordinate is defined to be
\begin{equation*}
{\bf F}[m] = \sum_{j = 0}^{N - 1} {\bf f}[j]e^{-2\pi i mj / N},\ m = 0, 1, \dots, N - 1
\end{equation*}
We denote the Fourier transform of a vector ${\bf f}$ by $\mathcal{F}{\bf f},$ and use $\mathcal{F}^{-1}$ to denote the inverse transformation. Hence for ${\bf F}$ as defined above, we have ${\bf F} = \mathcal{F}{\bf f}$ and ${\bf f} = \mathcal{F}^{-1}{\bf F}$.\\ \\
(14 points) For each vector/signal depicted in the left column, 1) draw a line connecting it with the plot on the right of the magnitude of its (discrete) Fourier transform, and 2) write one sentence justifying your answer in a high-level intuitive way (e.g. ``The Fourier transform of random noise is usual BLAH, and the plot at left looks like random noise and the plot at right looks like BLAH.''). So that you need to look at the shapes of the plots, the scale on the y-axis of the plots in the right column have been removed.\newline
\noindent
\begin{tcolorbox}
Table or list of the 7 indices of the corresponding right plot and the 7 one sentence explanations.
\end{tcolorbox}
{\bf Deliverables: } For each of the 7 plots on the left: an index of the corresponding plot on the right, and one sentence explanation.
\newpage
%\begin{figure}[H]\centering
%\includegraphics[width=650pt]{Presentation3.pdf}
%\end{figure}
\newpage
\item \textbf{Convolution.}
\noindent A concept that is closely related to the discrete Fourier transform is \textit{(linear) convolution}. Given a $N$-tuple $\bf f$ and a $M$-tuple $\bf g$, define their convolution ${\bf f} \ast {\bf g}$ to be a $(M + N - 1)$ tuple:
\footnote{We define ${\bf g}[k - j] = 0$ when $k - j < 0,$ or when $k-j\ge length(\bf g)$.}
\begin{equation*}
({\bf f} \ast {\bf g})[k] = \sum_{j = 0}^{N - 1}{\bf f}[j]{\bf g}[k - j]
\end{equation*}
\begin{enumerate}[label=(\alph*)]
\item (4 points)
Given a six-sided die, let $p$ be a 6-tuple where $p[i]$ denotes probability that $i + 1$ appears. Let $q = \underbrace{p \ast p \ast \cdots \ast p}_\text{100 times}$. What event occurs with probability $q[150]$? [Assume zero-index vectors.]
\begin{tcolorbox}
Answer.
\end{tcolorbox}
\newpage
\item (4 points)
Given an $N$-tuple $\bf f$, define ${\bf f}^+$ to be the $2N$-tuple obtained by padding $\bf f$ with zeros:
\begin{equation*}
{\bf f}^+[i] =
\begin{cases} {\bf f}[i] &\mbox{if } 0 \leq i \leq N - 1 \\
0 & \mbox{if } N \leq i \leq 2N - 1 \end{cases}
\end{equation*}
Verify that convolution is multiplication under the Fourier transform: for any two $N$-tuples $\bf f$ and $\bf g$, show that
\begin{equation*}
\mathcal{F}({\bf f} \ast {\bf g}) = \mathcal{F}{\bf f}^+ \cdot \mathcal{F}{\bf g}^+
\end{equation*}
where $\cdot$ denotes element-wise multiplication. [You should not just quote the lecture notes---you must write out the calculation explicitly.] Suggest an implementation of convolution using the Fast Fourier Transform and its inverse transform. What is the analogous conclusion you can make when the two tuples have different lengths?
%\item (3 points)
%For two $N$-tuples $\bf f$ and $\bf g$, what is the relationship between $\mathcal{F}(({\bf f} \cdot {\bf g})^+)$ and $\mathcal{F}{\bf f} \ast \mathcal{F}{\bf g}$?
\begin{tcolorbox}
Calculation:\\
Suggestion:\\
Analogous conclusion:
\end{tcolorbox}
\newpage
\item (5 points)
Using the Fast Fourier Transform (and its inverse transform), write a method $multiply(x, y)$ that takes in two arrays of digits, each representing an integer (lower indices represent lower digits), and return an array that represents the product of the two integers. For example, the output of $[0, 1, 2, 3]$ and $[4, 5, 6]$ should be $[0, 4, 3, 9, 9, 0, 2]$, representing the fact that $3,210 \times 654 = 2,099,340$. Using your code, what is the product of the following two numbers: $$x=12345678901234567890, \text{ and } y = 987654321098765432109876543210?$$ [For this part, please embed your \emph{actual} code in the solution, instead of including it in the appendix. Do not directly use convolution functions in your code.]
\begin{tcolorbox}
Answer:\\
Embedded code:
\end{tcolorbox}
\newpage
\item (4 points) Compare the asymptotic run-time of this multiplication algorithm to that of the naive grade-school integer multiplication algorithm. Feel free to refer to the runtime of the Fast Fourier Transform algorithm that we discussed in class.
\begin{tcolorbox}
Discussion:
\end{tcolorbox}
\newpage
\item
(Bonus: 2 points) For your implementation for part (c), roughly how large can the inputs be for the method to return accurate results? Justify your response via experiments and a discussion of the Fourier transform and the known limitations of your program environment (i.e. number of significant bits stored, etc.).
\end{enumerate}
\begin{tcolorbox}
Discussion:
\end{tcolorbox}
\newpage
\noindent {\bf Deliverables:} answer for (a); calculation, suggestion, and analogous conclusion for (b); code and answers for (c); discussion for (d) and (e).
\newpage
\item \textbf{The Sound of Fourier Transform.}
In this part, we explore how the Fourier transform can help us understand human sounds. Last year, the Laurel/Yanny audio clip ignited the internet. Some people hear the clip play as ``Laurel'' and some hear it as ``Yanny''. Several news outlets have posted explanations of the different perceptions, and made tools that let you perceive both interpretations. In this problem, you'll load the Laurel/Yanny audio clip, and process it in several ways that accentuate the ``Laurel'' or ``Yanny'' perceptions.
\medskip It is easy to load an audio file, such as the LaurelYanny.wav file we provide, into a Matab or Python numpy array. In both cases, you also need to keep track of the ``sample rate'' of the file, which corresponds to the number of datapoints per second. In Matlab, you can do this by:
\begin{lstlisting}
[data, sampleRate] = audioread('laurel_yanny.wav')
\end{lstlisting}
%recObj = audiorecorder; % recorder configuration
% recordblocking(recObj, 3); % records 3 seconds of audio
% data = getaudiodata(recObj); % the vector representing the audio
% sampleRate = 8000; % default sample rate
If you're working with Python, you can load a \verb|.wav| file as a numpy array in the following way:
%\footnote{There seems to be a bug in the way the {\bf scipy.io.wavfile} package processes some PPC {\bf .wav} files. If {\bf read()} throws an exception in your code, try to save the file in a different architecture, or use the Python default {\bf wave} package.}
\begin{lstlisting}
import scipy.io.wavfile as wavfile
import numpy as np
with open("laurel_yanny.wav", "rb") as f:
sampleRate, data = wavfile.read(f)
\end{lstlisting}
%if len(data.shape) = = 2 and data.shape[1] = = 2:
% data = data[:,1] '''remove the second channel if exists'''
\medskip
You can also listen to the sound represented by an array.
The Matlab command \verb|soundsc(data, sampleRate)| will play the waveform, sampled at a rate of \emph{sampleRate} over your computer speakers. If you're using Python,
you'll need to save the signal array as a \verb|.wav| file and then listen to the file:
\begin{lstlisting}
data = (data * 1.0/np.max(np.abs(data))*32767).astype(np.int16)
with open("output.wav", "wb") as f:
wavfile.write(f, sampleRate, data)
\end{lstlisting}
{\bf Exercises: }
\begin{enumerate}[label=(\alph*)]
\item (0 points)
Download and play the laurel/yanny recording. What do you hear? Most people hear ``Laurel'' or ``Yanny''/``Yarry'' or both.
\begin{tcolorbox}
N/A
\end{tcolorbox}
\newpage
\item (1 point)
Load the signal into an array using the instructions above and plot the waveform/signal that this 43,008 point array represents. (You will have a plot where the $x$-axis of the plot is time, and the $y$-axis is the physical position/displacement of the speaker cone.)
\begin{tcolorbox}
Plot
\end{tcolorbox}
\newpage
\item (3 point)
Take the discrete Fourier transform of the signal, and plot the real number corresponding to the magnitude of each coefficient of the Fourier transform. Describe what you see in words. Why should you expect this?
\begin{tcolorbox}
Plot:\\
Description of plot:\\
Explanation:
\end{tcolorbox}
\newpage
\item (5 points)
Plot the ``spectrogram'' of the signal, which illustrates how the amount of high/low frequencies change over the course of the audio clip. Specifically, chop the signal up into 500-sample long blocks, and compute the Fourier transform for each chunk. Plot a heatmap (in Python, plt.imshow() with cmap='hot'; in Matlab, imagesc()) where the x axis corresponds to the index of the chunk, and the y values corresponds to frequencies, and the value at location $x,y$ corresponds to the magnitude of the $y$th Fourier coefficient for the $x$th chunk. Since frequencies beyond the 80th Fourier coefficient are too high for humans to hear, the $y$-axis should just go from 1 to 80. Turn in this spectogram---you should see some curious structure. (If the contrast is too low to see clearly, you can make it higher by taking the square-root of each entry before plotting it....)
\begin{tcolorbox}
Spectrogram:
\end{tcolorbox}
\newpage
\item (5 points)
You will now change the signal to sound more like ``Laurel'' or, more like ``Yanny''. To start, take the original audio, and try 1) zeroing out all ``high'' frequencies---frequencies above a certain threshold $t$, and 2) zeroing out all frequencies below $t$. [This can be accomplished by multiplying the Fourier transform of the signal by the thresholding function, then taking the inverse Fourier transform.] Is there a threshold $t$ such that the ``low'' frequency version sounds like ``Laurel'' and the ``high'' frequency version sounds like ``Yanny/Yarry'' to you? If so, state what this frequency is.
\begin{tcolorbox}
Discussion:
\end{tcolorbox}
\newpage
\item (5 points)
Try changing the playback speed (by multiplying the ``sampleRate'' by values in the range $0.25$ to $2$. What effects do you observe?
\begin{tcolorbox}
Discussion:
\end{tcolorbox}
\newpage
\item (5 point bonus)
Can you find other ways of bringing out the Laurel or Yanny in the original audio? You can experiment with modifying the Fourier transform in more complicated ways, including fancier scalings, as well as things like translating the Fourier transform (i.e. appending zeros to the beginning, or removing the first set of values), etc. Discuss any especially effective approaches that you discovered.
\begin{tcolorbox}
Discussion:
\end{tcolorbox}
\newpage
\item (30 point bonus)
Can you create a new, distinct Laurel/Yanny style illusions? One natural research question is the extent to which this Laurel/Yanny phenomenon is an isolated instance. Can you create a different audio signal where one portion of the population perceives one word, and another portion of the population perceives a different word? What fraction of words can be turned into this sort of phenomenon? To show that question is not impossible, in ``WorldsYikes.zip'', we have several audio signals that we constructed which can be perceived as either the word \emph{worlds} or the word \emph{yikes} depending on the playback speed and/or damping of the high frequencies (try playing ``worldsYikes1.wav'', ``worldsYikes2.wav'',\ldots,``worldsYikes10.wav'' until you perceive both ``worlds'' and ``yikes''). For any example you find, include at least 3 or 4 versions of the audio clip so that we can hear both/all perceptions. Upload these files to YouTube and include the links in your solution.
Grading note: This bonus part will be graded in an all-or-nothing manner with no partial credit. If you end up constructing at least one good example (of comparable quality to the laurel/yanny or worlds/yikes examples) and describe your approach, you will get full credit and your example(s) will be presented to the entire class. Such a solution would also be a good step towards a nice research project; if you are keen to pursue this research direction further after the mini-project is due, please let know!
\end{enumerate}
\begin{tcolorbox}
Description of method:\\
Code:\\
Link to files:
\end{tcolorbox}
\newpage
\noindent {\bf Deliverables:} Plots for (b) and (c); description of plot and explanation for (c); image of spectrogram for (d); discussion for (e) and (f) , discussion, possibly including plots for (g); code, description of how to generate the illusions, and Youtube link for (h).
\iffalse
\item \textbf{Creating Echoes.}
In this part, you will try to create echoes and other transforms of your own recording. In addition to recording yourself, you'll need to be able to listen to the sound represented by an array.
The Matlab command \verb|soundsc(data, sampleRate)| allows you to do that; if you're using Python,
you'll need to save the signal array as a \verb|.wav| file and then listen to the file:
\begin{lstlisting}
data = data * 1.0 / np.max(np.abs(data)) # scaling
with open("output.wav", "w") as f:
wavfile.write(f, sampleRate, data)
\end{lstlisting}
\begin{enumerate}[label=(\alph*)]
\item
(Warm-up, do not submit.) Record yourself saying a sentence, such as ``Fast Fourier Transform is magical.'' Denote the signal by $\bf f$.
\item (3 points)
We will now add an echo: what filter (i.e. vector) $\bf f_1$ has the property that ${\bf f} \ast {\bf f_1}$ consists of the original recording, over-layed with an echo of delay $0.25$ seconds at half the volume? Make sure you listen to the signal---it should sound like the original recording but with an echo that is half as loud as the main recording. Plot the original signal $\bf f$ and ${\bf f} \ast {\bf f_1}$.
\item (3 points)
Now imagine you are in a large room with sound-reflecting walls (maybe a cathedral): suppose you are up against one of the walls, and there are 3 other walls, at distances 18 meters, 40 meters, and 53 meters. Assume the closest wall sends back a reflection that is $1/3$rd as loud as the original sound, the second wall's reflection is $1/4$th as loud, and the furthest wall is $1/8$th as loud. Assuming that the speed of sound is 344 meters per second, what filter $\bf f_2$ would you use to simulate the echoes created by the walls? [Note: you should define $\bf f_2$ so that the signal $\bf f + \bf f \ast \bf f_2$ would be what you, the speaker, hears.] Plot $\bf f+ {\bf f} \ast {\bf f_2}$.
\item (4 points)
Now, recall that your back is against the fourth wall--we should also take into account the fact that the echoes will themselves bounce of the wall and have their own echoes. How would you use $\bf f_2$ to create the effect of the echoes of echoes? [Hint: define a filter in terms of $\bf f_2$ that you would convolve the original signal by to obtain the desired signal with the echoes of the echoes.] What about taking into account the echoes of the echoes of the echoes? Listen to the recording, and play around with the distances, and dampening factors.
\item
Bonus: In echoes, higher frequencies are often dampened more than lower frequencies. How do you create a signal that addresses this effect?
\end{enumerate}
\noindent {\bf Deliverables:} $\bf f_1$, your derivation, and plot for (b); $\bf f_2$, your derivation, and plot for (c); discussion for (d); discussion for (e).
\fi
\end{enumerate}
\end{document}