% 
% LaTeX Problem Set Template by Sachin Padmanabhan
% I created this when I was a freshman in CS 103,
% and I continue to use it to this day.
%
% Hope you enjoy!
%
% There may be problems with this template.
% If so, feel free to contact me.
%

% Updated for Fall 2018 by Michael Zhu

\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{mathdots}
\usepackage[pdftex]{graphicx}
\usepackage{fancyhdr}
\usepackage[margin=1in]{geometry}
\usepackage{multicol}
\usepackage{bm}
\usepackage{listings}
\PassOptionsToPackage{usenames,dvipsnames}{color}  %% Allow color names
\usepackage{pdfpages}
\usepackage{algpseudocode}
\usepackage{tikz}
\usetikzlibrary{automata,positioning}
\usepackage{enumitem}
\usepackage[T1]{fontenc}
\usepackage{inconsolata}
\usepackage{framed}
\usepackage{wasysym}
\usepackage[thinlines]{easytable}
\usepackage{wrapfig}
\usepackage{hyperref}
\usepackage{mathrsfs}
\usepackage{tabularx} % also loads 'array' package
\newcolumntype{C}{>{\centering\arraybackslash}X} % centered version of 'X' columns

\title{\vspace{-1.5cm} CS 103: Mathematical Foundations of Computing\\Problem Set \#8}
\author{Your Name(s) Here}
\date{\today}

\lhead{Your Name(s) Here}
\chead{Problem Set \#8}
\rhead{\today}
\lfoot{}
\cfoot{CS 103: Mathematical Foundations of Computing --- Autumn 2018}
\rfoot{\thepage}

\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\absfit}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\eval}[3]{\left[#1\right]_{#2}^{#3}}
\renewcommand{\(}{\left(}
\renewcommand{\)}{\right)}
\newcommand{\floor}[1]{\left\lfloor#1\right\rfloor}
\newcommand{\ceil}[1]{\left\lceil#1\right\rceil}
\newcommand{\pd}[1]{\frac{\partial}{\partial #1}}
\newcommand{\inner}[1]{\langle#1\rangle}
\newcommand{\cond}{\bigg|}
\newcommand{\rank}[1]{\mathbf{rank}(#1)}
\newcommand{\range}[1]{\mathbf{range}(#1)}
\newcommand{\nullsp}[1]{\mathbf{null}(#1)}
\newcommand{\repr}[1]{\left\langle#1\right\rangle}
\newcommand\VRule[1][\arrayrulewidth]{\vrule width #1}

\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator{\Tr}{\mathbf{Tr}}
\DeclareMathOperator{\diag}{\mathbf{diag}}
\DeclareMathOperator{\dist}{\mathbf{dist}}
\DeclareMathOperator{\prob}{\mathbf{prob}}
\DeclareMathOperator{\dom}{\mathbf{dom}}
\DeclareMathOperator{\E}{\mathbf{E}}
\DeclareMathOperator{\Q}{\mathbb{Q}}
\DeclareMathOperator{\R}{\mathbb{R}}
\DeclareMathOperator{\var}{\mathbf{var}}
\DeclareMathOperator{\quartile}{\mathbf{quartile}}
\DeclareMathOperator{\conv}{\mathbf{conv}}
\DeclareMathOperator{\VC}{VC}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\Ber}{Bernoulli}
\DeclareMathOperator{\NP}{\mathbf{NP}}
\DeclareMathOperator{\coNP}{\mathbf{coNP}}
\DeclareMathOperator{\TIME}{\mathsf{TIME}}
\DeclareMathOperator{\polytime}{\mathbf{P}}
\DeclareMathOperator{\PH}{\mathbf{PH}}
\DeclareMathOperator{\SIZE}{\mathbf{SIZE}}
\DeclareMathOperator{\ATIME}{\mathbf{ATIME}}
\DeclareMathOperator{\SPACE}{\mathbf{SPACE}}
\DeclareMathOperator{\ASPACE}{\mathbf{ASPACE}}
\DeclareMathOperator{\NSPACE}{\mathbf{NSPACE}}
\DeclareMathOperator{\Z}{\mathbb{Z}}
\DeclareMathOperator{\N}{\mathbb{N}}
\DeclareMathOperator{\EXP}{\mathbf{EXP}}
\DeclareMathOperator{\NEXP}{\mathbf{NEXP}}
\DeclareMathOperator{\NTIME}{\mathbf{NTIME}}
\DeclareMathOperator{\DTIME}{\mathbf{DTIME}}
\DeclareMathOperator{\poly}{poly}
\DeclareMathOperator{\BPP}{\mathbf{BPP}}
\DeclareMathOperator{\ZPP}{\mathbf{ZPP}}
\DeclareMathOperator{\RP}{\mathbf{RP}}
\DeclareMathOperator{\coRP}{\mathbf{coRP}}
\DeclareMathOperator{\BPL}{\mathbf{BPL}}
\DeclareMathOperator{\IP}{\mathbf{IP}}
\DeclareMathOperator{\PSPACE}{\mathbf{PSPACE}}
\DeclareMathOperator{\NPSPACE}{\mathbf{NPSPACE}}
\DeclareMathOperator{\SAT}{\mathsf{SAT}}
\DeclareMathOperator{\NL}{\mathbf{NL}}
\DeclareMathOperator{\PCP}{\mathbf{PCP}}
\DeclareMathOperator{\PP}{\mathbf{PP}}
\DeclareMathOperator{\cost}{cost}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbf{Pr}}

\definecolor{shadecolor}{gray}{0.9}

\theoremstyle{plain}
\newtheorem*{lem}{Lemma}

\theoremstyle{plain}
\newtheorem*{claim}{Claim}

\theoremstyle{definition}
\newtheorem*{answer}{Answer}

\newtheorem{theorem}{Theorem}[section]
\newtheorem*{thm}{Theorem}
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}

\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}

\setlength{\parindent}{0pt}

\pagestyle{fancy}

\renewcommand{\thefootnote}{\fnsymbol{footnote}}

% start MZ
\let\oldemptyset\emptyset
\renewcommand{\emptyset}{\text{\O}}
\renewcommand\qedsymbol{$\blacksquare$}
\newenvironment{prf}{{\bfseries Proof.}}{\qedsymbol}
\renewcommand{\emph}[1]{\textit{\textbf{#1}}}
\newcommand{\annotate}[1]{\textit{\textcolor{blue}{#1}}}
\usepackage{stmaryrd}
\usepackage{minted}
\usepackage{mathtools}
\usepackage{alltt}
\newcommand{\ttt}[1]{\texttt{#1}}
\newcommand{\match}[1]{\textbf{\underline{#1}}}
\usepackage{parskip}
\newcommand{\lcurly}{\textbf{\ttt{\{}}}
\newcommand{\rcurly}{\textbf{\ttt{\}}}}
% end MZ


\begin{document}

\maketitle

\section{Designing CFGs}

For each of the following languages, design a CFG for that language. \textit{\textbf{Please use our online tool to design, test, and submit the CFGs in this problem}}. To use it, visit the CS103 website and click the ``CFG Editor'' link under the ``Resources'' header. You should only have one member from each team submit your grammars; tell us who this person is when you submit the rest of the problems through GradeScope. 

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Given $\Sigma = \{\ttt{a}, \ttt{b}, \ttt{c}\}$, write a CFG for the language $\{\ w \in \Sigma^*\ |\ w$ contains \ttt{aa} as a substring $\}$. For example, the strings \ttt{aa}, \ttt{baac}, and \ttt{ccaabb} are all in the language, but \ttt{aba} is not.
    
    \item Given $\Sigma = \{\ttt{a}, \ttt{b}\}$, write a CFG for the language $\{\ w \in \Sigma^*\ |\ w$ is \emph{not} a palindrome $\}$, the language of strings that are not the same when read forwards and backwards. For example, $\ttt{aab} \in L$ and $\ttt{baabab} \in L$, but $\ttt{aba} \not\in L$, $\ttt{bb} \not\in L$, and $\varepsilon \not\in L$. 
    
    \annotate{Don't try solving this one by starting with the CFG for palindromes and making modifications to it. In general, there's no way to mechanically turn a CFG for a language L into a CFG for the language $\overline{L}$, since the context-free languages aren't closed under complementation. However, the idea of looking at the first and last characters of a given string might be a good idea.}
    
    \item Let $\Sigma$ be an alphabet containing these symbols:
    \begin{equation*}
    \emptyset \quad\quad \N \quad\quad \{ \quad\quad \} \quad\quad , \quad\quad \cup
    \end{equation*}
    We can form strings from these symbols which represent sets. Here's some examples: 
    \begin{center}
    \begin{tabularx}{12cm}{X X X X}
    $\emptyset$ & $\{\emptyset, \N\} \cup \N \cup\ \emptyset$ & $\{\emptyset\} \cup \N \cup \{\N\}$ & $\{\emptyset, \emptyset, \emptyset\}$ \\[0.1cm]
    $\{\{\N, \emptyset\} \cup \{\emptyset\}\}$ & $\N \cup \{\N, \emptyset\}$ & $\{\}$ & $\{\N\}$ \\[0.1cm]
    $\{\emptyset, \{\emptyset, \{\emptyset\}\}\}$ & $\{\{\{\{\N\}\}\}\}$ & $\N$ & $\{\emptyset, \{\}\}$
    \end{tabularx}
    \end{center}
    Notice that some of these sets, like $\{\emptyset, \emptyset\}$ are syntactically valid but redundant, and others like $\{\}$ are syntactically valid but not the cleanest way of writing things. Here's some examples of strings that don't represent sets or aren't syntactically valid:
    \begin{center}
    \begin{tabularx}{12cm}{X X X X}
    $\varepsilon$ & $\}\emptyset\{$ & $\emptyset\{\N\}$ & $\{\{\}$ \\[0.1cm]
    $\N, \emptyset, \{\emptyset\}$ & $\{, \N\}$ & $\{ \N \emptyset \},$ & $\{,\}$ \\[0.1cm]
    $\{\emptyset$ & $\}\} \N$ & $\{ \emptyset, \emptyset, \emptyset, \}$ & $\{\N, , , \emptyset\}$
    \end{tabularx}
    \end{center}
    Write a CFG for the language $\{\ w \in \Sigma^*\ |\ w$ is a syntactically valid string representing a set $\}$. \\
    When using the CFG tool, please use the letters \ttt{n}, \ttt{u}, and \ttt{o} in place of $\N, \cup,$ and $\emptyset$, respectively. 
    
    \emph{Fun fact}: The starter files for Problem Set One contain a parser that's designed to take as input a string representing a set and to reconstruct what set that is. The logic we wrote to do that parsing was based on a CFG we wrote for sets and set theory. Take CS143 if you're curious how to go
    from a grammar to a parser! 
    
    \annotate{Test your CFG thoroughly! In Fall 2017, a quarter of the submissions we received weren't able to derive the string $\{\emptyset, \emptyset, \emptyset\}$.}
    
    \annotate{As a hint, as is often the case when writing CFGs, we recommend that you use different nonterminals to represent different components of the string. For example, structure of a comma-separated list is very different from the structure of an expression combining multiple sets together.}
\end{enumerate}

\begin{shaded}
Write the SUNetID of the person who submitted the CFGs here. 
\end{shaded}

\pagebreak

\section{The Complexity of Addition}

This problem explores the following question:
\begin{center}
\emph{How hard is it to add two numbers?}
\end{center}
Suppose that we want to check whether $x + y = z$, where $x, y,$ and $z$ are all natural numbers. If we want to phrase this as a problem as a question of strings and languages, we will need to find some way to standardize our notation. In this problem, we will be using the \textit{\textbf{unary number system}}, a number system in which the number $n$ is represented by writing out $n$ 1's. For example, the number 5 would be written as 11111, the number 7 as 1111111, and the number 12 as 111111111111. 

Given the alphabet $\Sigma = \{\ttt{1}, \ttt{+}, \ttt{=}\}$, we can consider strings encoding $x + y = z$ by writing out $x, y,$ and $z$ in
unary. For example: 
\begin{align*}
4 + 3 &= 7 \text{ would be encoded as }\ttt{111+1111=1111111} \\
7 + 1 &= 8 \text{ would be encoded as }\ttt{1111111+1=11111111} \\
0 + 1 &= 1 \text{ would be encoded as }\ttt{+1=1}
\end{align*}
Consider the alphabet $\Sigma = \{1, +, =\}$ and the following language, which we'll call \textit{ADD}:
\begin{equation*}
\{\ \ttt{1}^m\ttt{+1}^n\ttt{=1}^{m+n} \mid m, n \in \N\ \}
\end{equation*}
For example, the strings \ttt{111+1=1111} and \ttt{+1=1} are in the language, but \ttt{1+11=11} is not, nor is the string \ttt{1+1+1=111}.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove or disprove: the language \textit{ADD} defined above is regular.
\begin{shaded}
Provide a proof or disproof here. 
\end{shaded}
\item Write a context-free grammar for \textit{ADD}, showing that \textit{ADD} is context-free. (Please submit your CFG online.)

\annotate{You may find it easier to solve this problem if you first build a CFG for this language where you're allowed to have as many numbers added together as you'd like. Once you have that working, think about how you'd modify it so that you have exactly two numbers added together on the left-hand side of the equation.}

\begin{shaded}
Write the SUNetID of the person who submitted the CFG here. 
\end{shaded}    
\end{enumerate}

\pagebreak

\section{The Complexity of Pet Ownership}

This problem explores the following question:
\begin{center}
\textit{\textbf{How hard is it to walk your dog without a leash?}}
\end{center}
Let's imagine that you're going for a walk with your dog, but this time don't have a leash. As in Problem Set Six and Problem Set Seven, let $\Sigma = \{\ttt{y}, \ttt{d}\}$, where $\ttt{y}$ means that you take a step forward and $\ttt{d}$ means that your dog takes a step forward. A string in $\Sigma^*$ can be thought of as a series of events in which either you or your dog moves forward one unit. For example, the string $\ttt{yydd}$ means that you take two steps forward, then your dog takes two steps forward. 

Let \textit{DOGWALK} $ = \{w \in \Sigma^*\ |\ w$ describes a series of steps where you and your dog arrive at the same point$\}$. For example, the strings \ttt{yyyddd}, \ttt{ydyd}, and \ttt{yyyddddddyyy} are all in \textit{DOGWALK}.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove or disprove: the language \textit{DOGWALK} defined above is regular.
\begin{shaded}
Provide a proof or disproof here. 
\end{shaded}
\item Write a context-free grammar for \textit{DOGWALK}, showing that \textit{DOGWALK} is context-free. (Please submit your CFG online.)

\textit{\textcolor{blue}{ Be careful, and test your CFG! As you saw in lecture, a lot of ideas that seem plausible here don't work. }}
\begin{shaded}
Write the SUNetID of the person who submitted the CFG here. 
\end{shaded}
\end{enumerate}

\pagebreak

\section{The Complexity of RNA Hairpins}

RNA strands consist of strings of \textit{nucleotides}, molecules which encode genetic information. Computational biologists typically represent each RNA strand as a string made from four different letters, \ttt{A}, \ttt{C}, \ttt{G}, and \ttt{U}, each of which represents one of the four possible nucleotides. 

Each of the the four nucleotides has an affinity for a specific other nucleotide. Specifically: 
\begin{center}
\ttt{A} has an affinity for \ttt{U} (and vice-versa) \quad\quad\quad\quad \ttt{C} has an affinity for \ttt{G} (and vice-versa)
\end{center}
This can cause RNA strands to fold over and bind with themselves. Consider this RNA strand:
\begin{center}
\begin{tikzpicture}[auto,node distance=1cm, minimum size=0.7cm,
  thick,main node/.style={circle,draw}]

  \node[main node] (1) {G};
  \node[main node] (2) [right of=1] {A};
  \node[main node] (3) [right of=2] {U};
  \node[main node] (4) [right of=3] {U};
  \node[main node] (5) [right of=4] {A};
  \node[main node] (6) [right of=5] {C};
  \node[main node] (7) [right of=6] {A};
  \node[main node] (8) [right of=7] {G};
  \node[main node] (9) [right of=8] {G};
  \node[main node] (10) [right of=9] {U};
  \node[main node] (11) [right of=10] {A};
  \node[main node] (12) [right of=11] {A};
  \node[main node] (13) [right of=12] {U};
  \node[main node] (14) [right of=13] {C};

  \path[line width=1mm, style={font=\sffamily\small}]
    (1) edge node {} (2)
    (2) edge node {} (3)
    (3) edge node {} (4)
    (4) edge node {} (5)
    (5) edge node {} (6)
    (6) edge node {} (7)
    (7) edge node {} (8)
    (8) edge node {} (9)
    (9) edge node {} (10)
    (10) edge node {} (11)
    (11) edge node {} (12)
    (12) edge node {} (13)
    (13) edge node {} (14);
\end{tikzpicture} 
\end{center}
If you perfectly fold this RNA strand in half, you get the following:
\begin{center}
\begin{tikzpicture}[auto,node distance=1cm, minimum size=0.7cm,
  thick,main node/.style={circle,draw}]

  \node[main node] (1) {G};
  \node[main node] (2) [right of=1] {A};
  \node[main node] (3) [right of=2] {U};
  \node[main node] (4) [right of=3] {U};
  \node[main node] (5) [right of=4] {A};
  \node[main node] (6) [right of=5] {C};
  \node[main node] (7) [right of=6] {A};
  \node[main node] (8) [below of=7] {G};
  \node[main node] (9) [left of=8] {G};
  \node[main node] (10) [left of=9] {U};
  \node[main node] (11) [left of=10] {A};
  \node[main node] (12) [left of=11] {A};
  \node[main node] (13) [left of=12] {U};
  \node[main node] (14) [left of=13] {C};

  \path[line width=1mm, style={font=\sffamily\small}]
    (1) edge node {} (2)
    (2) edge node {} (3)
    (3) edge node {} (4)
    (4) edge node {} (5)
    (5) edge node {} (6)
    (6) edge node {} (7)
    (7) edge node {} (8)
    (8) edge node {} (9)
    (9) edge node {} (10)
    (10) edge node {} (11)
    (11) edge node {} (12)
    (12) edge node {} (13)
    (13) edge node {} (14);
\end{tikzpicture} \quad\quad
\begin{tikzpicture}[auto,node distance=1cm, minimum size=0.7cm,
  thick,main node/.style={circle,draw}]

  \node[main node] (1) {G};
  \node[main node] (2) [right of=1] {A};
  \node[main node] (3) [right of=2] {U};
  \node[main node] (4) [right of=3] {U};
  \node[main node] (5) [right of=4] {A};
  \node[main node] (6) [right of=5] {C};
  \node[main node] (7) [right of=6] {A};
  \node[main node] (8) [below of=7] {G};
  \node[main node] (9) [left of=8] {G};
  \node[main node] (10) [left of=9] {U};
  \node[main node] (11) [left of=10] {A};
  \node[main node] (12) [left of=11] {A};
  \node[main node] (13) [left of=12] {U};
  \node[main node] (14) [left of=13] {C};

  \path[line width=1mm, style={font=\sffamily\small}]
    (1) edge node {} (2)
         edge[line width=0.5mm,dotted] node {} (14)
    (2) edge node {} (3)
         edge[line width=0.5mm,dotted] node {} (13)
    (3) edge node {} (4)
         edge[line width=0.5mm,dotted] node {} (12)
    (4) edge node {} (5)
    	 edge[line width=0.5mm,dotted] node {} (11)
    (5) edge node {} (6)
    	 edge[line width=0.5mm,dotted] node {} (10)
    (6) edge node {} (7)
    	 edge[line width=0.5mm,dotted] node {} (9)
    (7) edge node {} (8)
    (8) edge node {} (9)
    (9) edge node {} (10)
    (10) edge node {} (11)
    (11) edge node {} (12)
    (12) edge node {} (13)
    (13) edge node {} (14);
\end{tikzpicture} 
\end{center}
Notice that each pair of nucleotides -- except for the \ttt{A} and the \ttt{G} on the far right -- are attracted to the corresponding nucleotide on the other side of the RNA strand. Because of the natural affinities of the nucleotides in the RNA strand, the RNA strand will be held in this shape. This is an example of an \textit{RNA hairpin}, a structure with important biological roles. 

For the purposes of this problem, we'll say that an RNA strand forms a hairpin if
\begin{itemize}
\item it has even length (so that it can be cleanly folded in half);
\item it has length at least ten (there are at least four pairs holding the hairpin shut); and
\item all of its nucleotides, except for the middle two, have an affinity for its corresponding nucleotide when folded over. (The middle two nucleotides in a hairpin might coincidentally have an affinity for one another, but it's not required. For example, \ttt{CCCC\underline{AU}GGGG} forms a hairpin.)
\end{itemize}
This problem explores the following question:
\begin{center}
\textit{\textbf{How hard is it to determine whether an RNA strand forms a hairpin?}}
\end{center}
Let $\Sigma = \{\ttt{A}, \ttt{C}, \ttt{G}, \ttt{U}\}$ and let $L_{\textit{RNA}} = \{\ w \in \Sigma^*\ |\ w$ represents an RNA strand that forms a hairpin $\}$. For example, the strings \ttt{UGACCCGUCA}, \ttt{GUACAAGUAC}, \ttt{UUUUUUUUUAAAAAAAAA}, and \ttt{CCAACCUUGG} are all in $L_{\textit{RNA}}$, but the strings \ttt{AU}, \ttt{AAAAACUUUUU}, \ttt{GGGC}, and \ttt{GUUUUAAAAG} are all not in $L_{\textit{RNA}}$.

Design a CFG for $L_{\textit{RNA}}$, which proves that the language is context-free. Please submit your grammar online. (This language turns out to not be regular, though the proof of that result using the Myhill-Nerode theorem is heavy on details and light on the intuition, so we won't ask you to do that here.)

\begin{shaded}
Write the SUNetID of the person who submitted the CFG here. 
\end{shaded}


\pagebreak

\section{Equivalence Classes and Regular Languages, Part Two}

On Problem Set Seven, you explored the \emph{indistinguishability} relation for $L$, denoted $\equiv_L$, defined as
\begin{center}
$x \equiv_L y \quad \text{if} \quad \forall w \in \Sigma^*.(xw \in L \leftrightarrow yw \in L).$
\end{center}
You specifically proved that for any language $L$, the relation $\equiv_L$ is an equivalence relation and that any DFA for $L$ must have at least $I(\equiv_L)$ states. In this problem, you're going to prove an amazing result:
\begin{center}
    \emph{Theorem:} If $L$ is a language where $I(\equiv_L)$ is finite, then $L$ is regular.
\end{center}
In other words, if you know absolutely nothing about a language other than there are finitely many equivalence classes of the $\equiv_L$ relation, then somewhere out there, there must be a DFA for $L$!

Let $L$ be an arbitrary language over some alphabet $\Sigma$ where $I(\equiv_L)$ is finite. We are going to prove that $L$ is regular by defining a 5-tuple $(Q, \Sigma, \delta, q_0, F)$ for this language $L$. The key insight behind this proof is how to choose $Q$. Specifically, we will choose $Q$ to be the set of equivalence classes of $\equiv_L$:
\begin{center}
    $Q = \{ [w]_{\equiv_L} \mid w \in \Sigma^* \}.$
\end{center}
It might seem strange to have the states of a DFA be sets, but then again, you've seen something like this before. In Problem Set Six, when working through the subset construction, you created a DFA whose states literally were sets of states of some particular NFA.

\begin{enumerate}[label=\roman*.]
    \item Explain why $Q$ is finite. This should take you at most a sentence or two.
    
    \begin{shaded}
    Write your explanation here. 
    \end{shaded}
\end{enumerate}

We now need to figure out how to pick a start state and wire up our transitions. Our goal will be to define $q_0$ and $\delta$ so that our DFA has the following property: if you run $w$ through this DFA, the state you end up in corresponds to $[w]_{\equiv_L}$. It turns out that choosing $q_0$ and $\delta$ as follows makes this work:
\begin{center}
    $q_0 = [\varepsilon]_{\equiv_L} \qquad\qquad\qquad \delta([x]_{\equiv_L}, a) = [xa]_{\equiv_L}.$
\end{center}
Of course, you shouldn't take our word for it. You should prove that these choices make everything work!

\begin{enumerate}[resume*]
    \item Prove that for any string $w \in \Sigma^*$, we have $\delta^*(w) = [w]_{\equiv_L}$.
    
    \annotate{Need a refresher on the definition of $\delta^*$? Check Problem Set Seven.}

    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
\end{enumerate}

To seal the deal, we need to choose our set of accepting states. We'll define $F$ as follows:
\begin{center}
    $F = \{[w]_{\equiv_L} \mid \exists x \in [w]_{\equiv_L}. x \in L \}.$
\end{center}
In other words, $F$ is the set of all equivalence classes containing at least one string in $L$.

\begin{enumerate}[resume*]
    \item On Problem Set Seven, you saw that we can formally define $\mathscr{L}(D) = \{ w \in \Sigma^* \mid \delta^*(w) \in F \}$. Prove that with this choice of $F$, we have $\mathscr{L}(D) = L$.

    \annotate{There is a ton of formal notation here, but at the end of the day, this question is just asking you to prove that two sets are equal. Think way back to Problem Set One. What's the easiest way to do this?}

    \annotate{Your proof should use the formal definitions provided here rather than higher-level concepts like ``the DFA accepts w'' or ``run the DFA on w.'' Also, perhaps a result from Problem Set Seven would be useful here?}
    
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
\end{enumerate}

By combining the two theorems you've explored about indistinguishability -- the one you proved last time, and the one from above -- we get this fundamental result:
\begin{center}
    \emph{Theorem (Myhill-Nerode):} A language $L$ is regular if and only if $I(\equiv_L)$ is finite. Furthermore, if $I(\equiv_L)$ is finite, the smallest possible DFA for $L$ has exactly $I(\equiv_L)$ states.
\end{center}
This result formalizes the intuition we've had about regular languages corresponding to problems you can solve with only finite memory. The ``memory'' you need corresponds to remembering which equivalence class the string you've seen so far happens to fall into.

If you talk to CS theory folk and mention ``the Myhill-Nerode theorem,'' they'll assume you're talking about the above theorem! The version we saw in lecture is just a special case of this more general one.

\pagebreak

\section{The Collatz Conjecture}
\newcommand{\ttbf}[1]{\texttt{\textbf{#1}}}
\newcommand{\1}{\ttbf{1}}

The \emph{Collatz conjecture} is a famous conjecture (an unproved claim) that says the following procedure (called the \emph{hailstone sequence}) terminates for all positive natural numbers $n$:
\begin{itemize}
\item If $n = 1$, stop.
\item If $n$ is even, set $n = n / 2$ and repeat from the top.
\item If $n$ is odd, set $n = 3n + 1$ and repeat from the top.
\end{itemize}
Let $L = \{\1^n \mid n \geq 1$ and the hailstone sequence terminates for $n \}$ be a language over the singleton alphabet $\Sigma = \{\1\}$. It turns out that it's possible to build a TM for this language which means that $L \in$ \textbf{RE}, and in this problem you'll do just that. Parts (i) and (ii) will ask you to design two useful subroutines, and you'll assemble the overall machine in part (iii).

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Design a TM subroutine that, given a tape holding a string of the form $\1^{2n}$ (where $n \in \N$) surrounded by infinitely many blanks, ends with $\1^n$ written on the tape, surrounded by infinitely many blanks. You can assume the tape head begins reading the first \1 (or points to an arbitrary blank cell in the case where $n = 0$), and your TM should end with the tape head reading the first \1 of the result (or any blank cell if $n = 0$). For example, given the initial configuration 
\begin{center}
\tikzset{tape/.style={fill=yellow!20}}
\begin{tikzpicture}[scale=0.5]
% tape
\draw[tape] (0,0) rectangle (1,1) node[pos=.5] {$\ldots$};
\draw[tape] (1,0) rectangle (2,1);
\foreach \x in {2,...,9} { % cells with 1s 
      \draw[tape] (\x,0) rectangle (\x+1,1) node[pos=.5] {\1};
}
\draw[tape] (10,0) rectangle (11,1);
\draw[tape] (11,0) rectangle (12,1) node[pos=.5] {$\ldots$};

% arrow
\def\x{2.5} % x-coord of bottom point of arrow
\def\y{1} % y-coord of bottom point of arrow
\coordinate (0) at (\x,\y); 
\coordinate (1) at (\x+0.5, \y+0.5); 
\coordinate (2) at (\x+0.25, \y+0.5); 
\coordinate (3) at (\x+0.25, \y+1); 
\coordinate (4) at (\x-0.25, \y+1); 
\coordinate (5) at (\x-0.25, \y+0.5); 
\coordinate (6) at (\x-0.5, \y+0.5);
\filldraw[draw=black, fill=black] (0) -- (1) -- (2) -- (3) -- (4) -- (5) -- (6) -- cycle; 
\end{tikzpicture}
\end{center}
your TM subroutine would end with this configuration:
\begin{center}
\tikzset{tape/.style={fill=yellow!20}}
\begin{tikzpicture}[scale=0.5]
% tape
\draw[tape] (0,0) rectangle (1,1) node[pos=.5] {$\ldots$};
\draw[tape] (1,0) rectangle (2,1);
\foreach \x in {2,...,5} { % cells with 1s 
      \draw[tape] (\x,0) rectangle (\x+1,1) node[pos=.5] {\1};
}
\foreach \x in {6,...,10} { % blank cells 
      \draw[tape] (\x,0) rectangle (\x+1,1);
}
\draw[tape] (11,0) rectangle (12,1) node[pos=.5] {$\ldots$};

% arrow
\def\x{2.5} % x-coord of bottom point of arrow
\def\y{1} % y-coord of bottom point of arrow
\coordinate (0) at (\x,\y); 
\coordinate (1) at (\x+0.5, \y+0.5); 
\coordinate (2) at (\x+0.25, \y+0.5); 
\coordinate (3) at (\x+0.25, \y+1); 
\coordinate (4) at (\x-0.25, \y+1); 
\coordinate (5) at (\x-0.25, \y+0.5); 
\coordinate (6) at (\x-0.5, \y+0.5);
\filldraw[draw=black, fill=black] (0) -- (1) -- (2) -- (3) -- (4) -- (5) -- (6) -- cycle; 
\end{tikzpicture}
\end{center}
You can assume that there are an even number of \1s on the tape at startup and can have your TM behave however you'd like if this isn't the case. Please use our provided TM editor to design, develop, test, and submit your answer to this question. Since our TM tool doesn't directly support subroutines, just have your machine accept when it's done. 

\annotate{For reference, our solution has fewer than 10 states. If you have significantly more than this and are struggling to get your TM working, you might want to change your approach. It's totally fine if you have a bunch of states, provided that your solution works.}

\annotate{There are a lot of different solutions here. Some use very little extra tape. Some use a lot of extra tape. Some don't need any other tape symbols. Some do. Be creative, try things out, and don't be afraid to back up and try something else if your approach doesn't seem to be panning out.}

\begin{center}
    \annotate{(Continued on the next page...)}
\end{center}

\pagebreak

\item Design a TM subroutine that, given a tape holding a string of the form $\1^n$ (for some $n \in \N$), surrounded by infinitely many blanks, ends with $\1^{3n+1}$ written on the tape, surrounded by infinitely many blanks. You can assume that the tape head begins reading the first \1, and your TM should end with the tape head reading the first \1 of the result. For example, given this configuration
\begin{center}
\tikzset{tape/.style={fill=yellow!20}}
\begin{tikzpicture}[scale=0.5]
% tape
\draw[tape] (0,0) rectangle (1,1) node[pos=.5] {$\ldots$};
\draw[tape] (1,0) rectangle (2,1);
\foreach \x in {2,...,4} { % cells with 1s 
      \draw[tape] (\x,0) rectangle (\x+1,1) node[pos=.5] {\1};
}
\foreach \x in {5,...,12} { % blank cells 
      \draw[tape] (\x,0) rectangle (\x+1,1);
}
\draw[tape] (13,0) rectangle (14,1) node[pos=.5] {$\ldots$};

% arrow
\def\x{2.5} % x-coord of bottom point of arrow
\def\y{1} % y-coord of bottom point of arrow
\coordinate (0) at (\x,\y); 
\coordinate (1) at (\x+0.5, \y+0.5); 
\coordinate (2) at (\x+0.25, \y+0.5); 
\coordinate (3) at (\x+0.25, \y+1); 
\coordinate (4) at (\x-0.25, \y+1); 
\coordinate (5) at (\x-0.25, \y+0.5); 
\coordinate (6) at (\x-0.5, \y+0.5);
\filldraw[draw=black, fill=black] (0) -- (1) -- (2) -- (3) -- (4) -- (5) -- (6) -- cycle; 
\end{tikzpicture}
\end{center}
your TM subroutine would end with this confguration:
\begin{center}
\tikzset{tape/.style={fill=yellow!20}}
\begin{tikzpicture}[scale=0.5]
% tape
\draw[tape] (0,0) rectangle (1,1) node[pos=.5] {$\ldots$};
\draw[tape] (1,0) rectangle (2,1);
\foreach \x in {2,...,11} { % cells with 1s 
      \draw[tape] (\x,0) rectangle (\x+1,1) node[pos=.5] {\1};
}
\draw[tape] (12,0) rectangle (13,1);
\draw[tape] (13,0) rectangle (14,1) node[pos=.5] {$\ldots$};

% arrow
\def\x{2.5} % x-coord of bottom point of arrow
\def\y{1} % y-coord of bottom point of arrow
\coordinate (0) at (\x,\y); 
\coordinate (1) at (\x+0.5, \y+0.5); 
\coordinate (2) at (\x+0.25, \y+0.5); 
\coordinate (3) at (\x+0.25, \y+1); 
\coordinate (4) at (\x-0.25, \y+1); 
\coordinate (5) at (\x-0.25, \y+0.5); 
\coordinate (6) at (\x-0.5, \y+0.5);
\filldraw[draw=black, fill=black] (0) -- (1) -- (2) -- (3) -- (4) -- (5) -- (6) -- cycle; 
\end{tikzpicture}
\end{center}
You can assume the number of \1s on the tape at startup is odd and can have your TM behave however you'd like if this isn't the case. Please use our provided TM editor to design, develop, test, and submit your answer to this question. Since our TM tool doesn't directly support subroutines, just have your machine accept when it's done. 

\annotate{For reference, our solution has fewer than 10 states. If you have significantly more than this, you might want to change your approach.}

\item Draw the state transition diagram for a Turing machine M that recognizes $L$. Our TM tool is configured for this problem so that you can use our reference solutions for parts (i) and (ii) as subroutines in your solution. To do so, follow these directions:

\begin{enumerate}[label=\arabic*.,ref=\arabic*]
    \item Create states named \ttbf{half}, \ttbf{half\_}, \ttbf{trip}, and \ttbf{trip\_}.
    
    \item To execute the subroutine that converts $\1^{2n}$ into $\1^n$, transition into the state named \ttbf{half}. When that subroutine finishes, the TM will automagically jump into the state labeled \ttbf{half\_}. You do not need to -- and should not -- define any transitions into \ttbf{half\_} or out of \ttbf{half}.
    
    \item To execute the subroutine that converts $1^n$ into $1^{3n+1}$, transition into the state named \ttbf{trip}. When that subroutine finishes, the TM will automagically jump into the state labeled \ttbf{trip\_}. You do not need to -- and should not -- define any transitions into \ttbf{trip\_} or out of \ttbf{trip}.
\end{enumerate}

A TM $M$ recognizes a language $L$ if $M$ accepts all of the strings in $L$ and either rejects or loops on all strings that are not in $L$. In other words, your TM should accept every string in $L$, and for any string not in $L$ it can either loop infinitely or reject the string.

Please use our provided TM editor to design, develop, test, and submit your answer to this question.

\annotate{For reference, our solution has fewer than 15 states. If you have significantly more than this,
you might want to change your approach.}

\annotate{For those of you who did the Hailstone Sequence problem in CS106A -- you're now solving the same problem using pure mathematics! Did you expect you'd ever get to do something like that?}

\annotate{Because TMs can go into infinite loops, our provided TM simulator will only simulate TMs for some fixed, large number of steps. Some inputs to your TM might take a very long time to complete purely because the Hailstone Sequence takes a long time to complete in those cases. When that happens, you'll get timeouts reported. It's probably nothing to worry about if you're seeing timeouts for very large or long strings, but you shouldn't be seeing timeouts for, say, strings whose lengths are between 5 and 10.}

\begin{shaded}
Write the SUNetID of the person who submitted the TMs here. 
\end{shaded}
\end{enumerate}

\pagebreak

\section{TMs, Formally}

Just as it's possible to formally define a DFA using a 5-tuple, it's possible to formally define a TM as an 8-tuple $(Q, \Sigma, \Gamma, B, q_0, Y, N, \delta)$ where
\begin{itemize}
    \item $Q$ is a finite set of states, which can be anything;
    \item $\Sigma$ is a finite, nonempty set called the \textit{\textbf{input alphabet}};
    \item $\Gamma$ is a finite, nonempty set called the \textit{\textbf{tape alphabet}}, where $\Sigma \subseteq \Gamma$;
    \item $B \in \Gamma - \Sigma$ is the \textit{\textbf{blank symbol}};
    \item $q_0$ is the start state, where $q_0 \in Q$;
    \item $Y \subseteq Q$ is the set of \textit{\textbf{accepting states}};
    \item $N \subseteq Q$ is the set of \textit{\textbf{rejecting states}}, where $Y \cap N = \emptyset$; and
    \item $\delta$ is the \textit{\textbf{transition function}}, described below.
\end{itemize}

Remember that the definition is the official arbiter of what is legal and what isn’t, so if the definition doesn’t preclude something, it’s legal regardless of how counterintuitive or weird it might be. This question explores some aspects of the definition.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Is it possible to have a TM with no states? Justify your answer.
    \begin{shaded}
    Provide your answer and justification here. 
    \end{shaded}
    
    \item Is it possible to have a TM with no \textit{accepting} states? Justify your answer.
    \begin{shaded}
    Provide your answer and justification here. 
    \end{shaded}
    
    \item Is it possible to have a TM with no \textit{rejecting} states? Justify your answer.
    \begin{shaded}
    Provide your answer and justification here. 
    \end{shaded}
    
    \item Why is the restriction $Y \cap N = \emptyset$ there? Justify your answer.
    \begin{shaded}
    Provide your answer and justification here. 
    \end{shaded}
    
    \item Is it possible to have a TM where $\Sigma = \Gamma$? Justify your answer.
    \begin{shaded}
    Provide your answer and justification here. 
    \end{shaded}
\end{enumerate}

Now, let's talk about the transition function. As with DFAs, the transition function of a Turing machine is what formally defines the transitions. If $q$ is a state in a TM that isn't an accepting state or a rejecting state and $a$ is a symbol that can appear on the TM's tape, then
\begin{equation*}
\delta(q, a) = (r, b, D)
\end{equation*}
where $r$ is the new state to transition into, $b$ is the symbol to write back to the tape, and $D$ is either $L$ for ''move left'' or $R$ for ''move right.'' Because TMs immediately stop running after entering an accepting or rejecting state, the $\delta$ function should not be defined for any state $q$ that's either accepting or rejecting. Aside from this, $\delta$ should be defined for every combination of a (non-accepting, non-rejecting) state $q$ and any symbol $a$ that can appear on the tape.

\begin{enumerate}[resume*]
\item Based on the above description of $\delta$, what should the domain of $\delta$ be? What should it codomain
be? Answer this question by filling in the following blanks, and briefly justify your answer.

\annotate{In class, we said that any missing transitions in a TM implicitly reject. By that, we didn't mean that the TM's transition function can be undefined on certain inputs. Instead, it means ``if we don't draw in a transition in a picture representing the TM, it means that the transition does exist and goes to a rejecting state, but we were just too lazy to draw it in.'' So you should assume that every transition not drawn in a picture of a TM really is there and really goes to some rejecting state.}

\annotate{Also, take a moment to appreciate the fact that you can read the notation in this question and understand what it means! Could you imagine doing that at the start of the quarter?}

\begin{shaded}
\begin{center}
$\delta$: \underline{Write your answer here} $\rightarrow$  \underline{and here.}
\end{center}
Provide justification here.
\end{shaded}
\end{enumerate}

\pagebreak


\section{Regular and Decidable Languages} 

In class, we alluded to the fact that \textbf{REG} (the class of all regular languages) is a subset of \textbf{R} (the class of all decidable languages), but we never actually justified this claim.

Describe a construction that, given a DFA $D$, produces a decider $D'$ where $\mathscr{L}(D) = \mathscr{L}(D')$. Briefly justify why the TM $D'$ you construct is a decider and why it accepts precisely the strings that $D$ accepts. Illustrate your example by applying it to a small DFA $D$ of your choice. 

Although you have a formal 5-tuple definition of a DFA and a formal 8-tuple definition of a TM at your disposal, we're not expecting you to write your solution at that level of detail. 

\annotate{Remember that DFAs and TMs work completely differently with regards to accepting and rejecting states and that the transitions in TMs have a very different structure than the transitions in DFAs!}

\begin{shaded}
Provide an answer here. 
\end{shaded}

\pagebreak


\section{What Does it Mean to Solve a Problem?}

Let $L$ be a language over $\Sigma$ and $M$ be a TM with input alphabet $\Sigma$. Below are three potential traits of $M$: 

\begin{center}
    \begin{minipage}[t]{.6\textwidth}
    \begin{enumerate}
        \item $M$ halts on all inputs.
        \item For any string $w \in \Sigma^*$, if $M$ accepts $w$, then $w \in L$.
        \item For any string $w \in \Sigma^*$, if $M$ rejects $w$, then $w \not\in L$.\\
    \end{enumerate}
    \end{minipage}
\end{center} 

At some level, for a TM to claim to solve a problem, it should have at least some of these properties. Interestingly, though, just having two of these properties doesn't say much. 

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (1) and (2).
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
    
    \item Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (1) and (3).
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
    
    \item Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (2) and (3).
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
    
    \item Suppose that $L$ is a language over $\Sigma$ for which there is a TM $M$ that satisfies properties (1), (2), and (3). What can you say about $L$? Prove it.
    \begin{shaded}
    Provide your answer and proof here.
    \end{shaded}
\end{enumerate}

\annotate{The whole point of this problem is to show that you have to be extremely careful about how you define ``solving a problem,'' since if you define it incorrectly then you can ``solve'' a problem in a way that bears little resemblance to what we'd think of as solving a problem. Keep this in mind as you work through this one.}



\section*{Optional Fun Problem One: TMs and Regular Languages (Extra Credit)}

Let $M$ be a TM with the following property: there exists a natural number $k$ such that after $M$ is run on any string $w$, $M$ always halts after at most $k$ steps. Prove that $\mathscr{L}(M)$ is regular.

\begin{shaded}
Provide a proof here. 
\end{shaded}

\end{document}