% 
% LaTeX Problem Set Template by Sachin Padmanabhan
% I created this when I was a freshman in CS 103,
% and I continue to use it to this day.
%
% Hope you enjoy!
%
% There may be problems with this template.
% If so, feel free to contact me.
%

% Update for Fall 2018 by Michael Zhu
% Update for Winter 2020 by Cynthia Lee

\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{mathdots}
\usepackage[pdftex]{graphicx}
\usepackage{fancyhdr}
\usepackage[margin=1in]{geometry}
\usepackage{multicol}
\usepackage{bm}
\usepackage{listings}
\PassOptionsToPackage{usenames,dvipsnames}{color}  %% Allow color names
\usepackage{pdfpages}
\usepackage{algpseudocode}
\usepackage{tikz}
\usetikzlibrary{automata,positioning}
\usepackage{enumitem}
\usepackage[T1]{fontenc}
\usepackage{inconsolata}
\usepackage{framed}
\usepackage{wasysym}
\usepackage[thinlines]{easytable}
\usepackage{wrapfig}
\usepackage{hyperref}
\usepackage{mathrsfs}
\usepackage{tabularx} % also loads 'array' package
\newcolumntype{C}{>{\centering\arraybackslash}X} % centered version of 'X' columns
\usepackage{diagbox}

\title{CS 103: Mathematical Foundations of Computing\\Problem Set \#7}
\author{Your Name(s) Here}
\date{\today}
\lhead{Your Name(s) Here}
\chead{Problem Set \#7}
\rhead{\today}
\lfoot{}
\cfoot{CS 103: Mathematical Foundations of Computing --- Winter 2020}
\rfoot{\thepage}

\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\absfit}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\eval}[3]{\left[#1\right]_{#2}^{#3}}
\renewcommand{\(}{\left(}
\renewcommand{\)}{\right)}
\newcommand{\floor}[1]{\left\lfloor#1\right\rfloor}
\newcommand{\ceil}[1]{\left\lceil#1\right\rceil}
\newcommand{\pd}[1]{\frac{\partial}{\partial #1}}
\newcommand{\inner}[1]{\langle#1\rangle}
\newcommand{\cond}{\bigg|}
\newcommand{\rank}[1]{\mathbf{rank}(#1)}
\newcommand{\range}[1]{\mathbf{range}(#1)}
\newcommand{\nullsp}[1]{\mathbf{null}(#1)}
\newcommand{\repr}[1]{\left\langle#1\right\rangle}
\newcommand\VRule[1][\arrayrulewidth]{\vrule width #1}

\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator{\Tr}{\mathbf{Tr}}
\DeclareMathOperator{\diag}{\mathbf{diag}}
\DeclareMathOperator{\dist}{\mathbf{dist}}
\DeclareMathOperator{\prob}{\mathbf{prob}}
\DeclareMathOperator{\dom}{\mathbf{dom}}
\DeclareMathOperator{\E}{\mathbf{E}}
\DeclareMathOperator{\Q}{\mathbb{Q}}
\DeclareMathOperator{\R}{\mathbb{R}}
\DeclareMathOperator{\var}{\mathbf{var}}
\DeclareMathOperator{\quartile}{\mathbf{quartile}}
\DeclareMathOperator{\conv}{\mathbf{conv}}
\DeclareMathOperator{\VC}{VC}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\Ber}{Bernoulli}
\DeclareMathOperator{\NP}{\mathbf{NP}}
\DeclareMathOperator{\coNP}{\mathbf{coNP}}
\DeclareMathOperator{\TIME}{\mathsf{TIME}}
\DeclareMathOperator{\polytime}{\mathbf{P}}
\DeclareMathOperator{\PH}{\mathbf{PH}}
\DeclareMathOperator{\SIZE}{\mathbf{SIZE}}
\DeclareMathOperator{\ATIME}{\mathbf{ATIME}}
\DeclareMathOperator{\SPACE}{\mathbf{SPACE}}
\DeclareMathOperator{\ASPACE}{\mathbf{ASPACE}}
\DeclareMathOperator{\NSPACE}{\mathbf{NSPACE}}
\DeclareMathOperator{\Z}{\mathbb{Z}}
\DeclareMathOperator{\N}{\mathbb{N}}
\DeclareMathOperator{\EXP}{\mathbf{EXP}}
\DeclareMathOperator{\NEXP}{\mathbf{NEXP}}
\DeclareMathOperator{\NTIME}{\mathbf{NTIME}}
\DeclareMathOperator{\DTIME}{\mathbf{DTIME}}
\DeclareMathOperator{\poly}{poly}
\DeclareMathOperator{\BPP}{\mathbf{BPP}}
\DeclareMathOperator{\ZPP}{\mathbf{ZPP}}
\DeclareMathOperator{\RP}{\mathbf{RP}}
\DeclareMathOperator{\coRP}{\mathbf{coRP}}
\DeclareMathOperator{\BPL}{\mathbf{BPL}}
\DeclareMathOperator{\IP}{\mathbf{IP}}
\DeclareMathOperator{\PSPACE}{\mathbf{PSPACE}}
\DeclareMathOperator{\NPSPACE}{\mathbf{NPSPACE}}
\DeclareMathOperator{\SAT}{\mathsf{SAT}}
\DeclareMathOperator{\NL}{\mathbf{NL}}
\DeclareMathOperator{\PCP}{\mathbf{PCP}}
\DeclareMathOperator{\PP}{\mathbf{PP}}
\DeclareMathOperator{\cost}{cost}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbf{Pr}}

\definecolor{shadecolor}{gray}{0.9}

\theoremstyle{plain}
\newtheorem*{lem}{Lemma}

\theoremstyle{plain}
\newtheorem*{claim}{Claim}

\newtheorem{theorem}{Theorem}[section]
\newtheorem*{thm}{Theorem}
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}

\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}

\setlength{\parindent}{0pt}

\pagestyle{fancy}

\renewcommand{\thefootnote}{\fnsymbol{footnote}}

% start MZ
\usetikzlibrary{automata,positioning}
\let\oldemptyset\emptyset
\renewcommand{\emptyset}{\text{\O}}
\renewcommand\qedsymbol{$\blacksquare$}
\newenvironment{prf}{{\bfseries Proof.}}{\qedsymbol}
\renewcommand{\emph}[1]{\textit{\textbf{#1}}}
\newcommand{\annotate}[1]{\textit{\textcolor{blue}{#1}}}
\usepackage{stmaryrd}
\usepackage{minted}
\usepackage{mathtools}
\usepackage{alltt}
\newcommand{\ttt}[1]{\texttt{#1}}
\newcommand{\match}[1]{\textbf{\underline{#1}}}
\usepackage{parskip}
\newcommand{\lcurly}{\textbf{\ttt{\{}}}
\newcommand{\rcurly}{\textbf{\ttt{\}}}}

% end MZ

\begin{document}

\maketitle{\vspace{-1.0cm}}

\section{Designing Regular Expressions}

Below are a list of alphabets and languages over those alphabets. For each language, write a regular expression for that language. \emph{Please use our online tool to design, test, and submit your regular expressions. Typed or handwritten solutions will not be accepted.} To use it, visit the CS103 website and click the ``Regex Editor'' link under the ``Resources'' header. If you submit in a pair, please tell us in your GradeScope submission who submitted your answers to this question. Also, as a reminder, please test your submissions thoroughly, since we'll be grading them with an autograder.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Let $\Sigma = \{\ttt{a}, \ttt{b}, \ttt{c}, \ttt{d}, \ttt{e}\}$. Write a regular expression for the language $\{w \in \Sigma^* \mid $ the letters in $w$ are sorted alphabetically\}. 
    
    \item Write a regular expression for the complement of the language from part (i) of this problem.
    
    \annotate{As with NFAs, there's no simple way to start with a regex for a language $L$ and to turn it into a regex for $\overline{L}$.}
    
    \item On Unix-style operating systems like macOS or Linux, files are organized into directories. You can reference a file by giving a \textit{path} to the file, a series of directory names separated by slashes. For example, the path \ttt{/home/username/} might represent a user's home directory, and a path like \ttt{/home/username/Documents/PS7.tex} might represent that person's solution to this problem set. Paths that start with a slash character are called \textit{absolute paths} and say exactly where the file is on disk. Paths that don't start with a slash are called \textit{relative paths} and say where, relative to the current folder, a file can be found. For example, if I'm logged into my computer and am in my home folder, I could look up the file \ttt{Documents/PS7.tex} to find my solution to this problem set.
    
    The general pattern here is that a file path consists of a series of directory or file names separated by slashes. That path might optionally start with a slash, but isn't required to, and it might option- ally end with a slash, but isn't required to. However, you can't have two consecutive slashes. \footnote{In some cases you technically \textit{can} have multiple consecutive slashes, but we'll ignore that for now.}
    
    Let $\Sigma = \{a,/\}$. Write a regular expression for $L = \{w \in \Sigma^* \mid w$ represents the name of a file path on a Unix-style system \}. For example, $\ttt{/aaa/a/aa} \in L$, $\ttt{/} \in L$, $\ttt{a} \in L$, $\ttt{/a/a/a/} \in L$, and $\ttt{aaa/} \in L$, but $\ttt{//a//} \notin L$, $\ttt{a//a} \notin L$, and $\varepsilon \notin L$.
    
    Fun fact: this problem comes from former TA Amy Liu, who fixed a bug in industrial code that arose when someone wrote the wrong regex for this language. Oops.
    
    \item Suppose you are taking a walk with your dog on a leash of length two. Let $\Sigma = \{\ttt{y}, \ttt{d}\}$ and let  $L = \{w \in \Sigma^* \mid w$ represents a walk with your dog on a leash where you and your dog both end up at the same location\}. For example, we have $\ttt{yyddddyy} \in L$ because you and your dog are never more than two steps apart and both of you end up four steps ahead of where you started; similarly, $\ttt{ddydyy} \in L$. However, $\ttt{yyyyddd} \notin L$, since halfway through your walk you're three steps ahead of your dog; $\ttt{ddyd} \notin L$, because your dog ends up two steps ahead of you; and $\ttt{ddyddyyy} \notin L$, because at one point your dog is three steps ahead of you. Write a regular expression for $L$.
    
    \annotate{Note that, unlike Problem Set Six, you and your dog \textbf{must} end at the same position.}
    
\end{enumerate}
\begin{shaded}
Write the SUNetID of the person who submitted the regular expressions here. 
\end{shaded}


\section{Finite Languages}

A language $L$ is called \emph{finite} if $L$ contains finitely many strings (that is, $|L|$ is a natural number). It turns out that all finite languages are regular. Given a finite language $L$, explain how to write a regular expression for $L$. Briefly justify your answer; no formal proof is necessary.
    
\annotate{Watch for edge cases!}
    
\begin{shaded}
Provide an explanation here. 
\end{shaded}
    
\annotate{As an optional thought exercise, you might also think about how you could make an NFA or a DFA for any finite language. As a hint, look up the trie data structure on Wikipedia.}

\pagebreak

\section{State Elimination}

The state elimination algorithm gives a way to transform a finite automaton (a DFA or a NFA) into a regular expression. It's a really beautiful algorithm once you get the hang of it, so we thought that we'd let you try it out on a particular example.

Let $\Sigma = \{\ttt{a}, \ttt{b}\}$ and let $L = \{w \in \Sigma^* \mid w$ has an even number of \ttt{a}'s and an even number of \ttt{b}'s\}. Below is a finite automaton for $L$ that we've prepared for the state elimination algorithm by adding in a new start state $q_{\text{start}}$ and a new accept state $q_{\text{acc}}$:

\begin{center}
\tikzstyle{accepting}=[double distance=2pt, outer sep=1pt+\pgflinewidth]
\begin{tikzpicture}[shorten >=1pt,node distance=2.2cm,on grid,auto, state/.style={circle, draw, minimum size=1.1cm}] 
   % \node[state, options?] (identifier) [placement] {node label};
   \node[state, initial] (s) {$q_{\text{start}}$};
   \node[state] (0) [right=of s] {$q_0$};
   \node[state] (1) [right=of 0] {$q_1$}; 
   \node[state, accepting] (a) [above=of 0] {$q_{\text{acc}}$};
   \node[state] (2) [below=of 0] {$q_2$}; 
   \node[state] (3) [below=of 1] {$q_3$}; 
    
    % (start identifier) edge [options?] node {edge label} (target identifier)
    %                    edge [options?] node {edge label} (another target identifier)                  
    \path[->] 
    (s) edge node {$\varepsilon$} (0)
    (0) edge [bend left=10] node {$a$} (1)
        edge node {$\varepsilon$} (a)
        edge [bend left=10] node {$b$} (2)
    (1) edge [bend left=10] node {$a$} (0)
        edge [bend left=10] node {$b$} (3)
    (2) edge [bend left=10] node {$b$} (0)
        edge [bend left=10] node {$a$} (3)
    (3) edge [bend left=10] node {$b$} (1)
        edge [bend left=10] node {$a$} (2);
\end{tikzpicture}
\end{center}

We'd like you to use the state elimination algorithm to produce a regular expression for $L$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Run two steps of the state elimination algorithm on the above automaton. Specifically, first remove state $q_1$, then remove state $q_2$. Show your result at this point.
    
    \textit{\textcolor{blue}{ Go slowly. Remember that to eliminate a state q, you should identify all pairs of states $q_{in}$ and $q_{out}$ where there's a transition from $q_{in}$ to q and from q to $q_{out}$, then add shortcut edges from $q_{in}$ to $q_{out}$ to bypass state q. Remember that $q_{in}$ and $q_{out}$ can be the same state. If you've done everything right at the end of this stage, none of the transitions you have at this point should have Kleene stars in them. }}
    
    \begin{shaded}
    Provide the resulting automaton here. 
    % You can copy and modify the above automaton drawn using the tikz automata package, 
    % or use a finite automata editor such as http://madebyevan.com/fsm/ 
    % If you choose to draw and then scan/take a picture, *please* make sure that the image is clearly legible in Gradescope!
    \end{shaded}
    
    \item Finish the state elimination algorithm, showing your work. \emph{Submit your resulting regular expression through our regular expression tool} along the lines of what you did in Problem One.
    
    \annotate{You should start seeing Kleene stars appearing as you remove the remaining states.}
    
    \annotate{Not sure whether you have the right answer? \textbf{Test your result thoroughly!}}
    
    \begin{shaded}
    Provide an answer here. 
    \end{shaded}
    
    \item Without making reference to the original automaton given above, give an intuitive explanation for how the regular expression you found in part (ii) works.
    
    \annotate{The regex you've found works by matching strings in a very creative way. Tell us what that way is.}
    
    \begin{shaded}
    Provide an answer here. 
    \end{shaded}
\end{enumerate}

\pagebreak

\section{At Least Three Point Five}

The Myhill-Nerode theorem is a powerful tool for finding nonregular languages, but it can take some adjusting to get used to.

Let $\Sigma = \{\ttt{a}, \ttt{b}\}$ and consider the following language:
\begin{equation*}
L = \{w \in \Sigma^* \mid \text{there are at least as many \ttt{a}'s as \ttt{b}'s in }w\}.
\end{equation*}
This language $L$ isn't regular; make sure you have an intuition for why this is. 

Below is an attempted proof that $L$ isn't regular. Although the claim it's proving is indeed true, this proof contains an error that renders it incorrect.
\begin{quote}
\emph{Theorem:} $L$ is not a regular language.

\emph{Proof:} Consider the set $S = \{ a^n\ |\ n \in \N \}$. This set is infinite because it contains one string for each natural number. We will prove that any two distinct strings in $S$ are distinguishable relative to $L$. To do so, consider any distinct strings $\ttt{a}^m, \ttt{a}^n \in S$, and assume without loss of generality that $m > n$. Then $\ttt{b}^m\ttt{a}^m \in L$ because this string contains the same number of \ttt{a}'s and \ttt{b}'s, but $\ttt{b}^m\ttt{a}^n \notin L$ because it contains $m$ \ttt{b}'s and $n$ \ttt{a}'s and $m > n$. Therefore, we see that $\ttt{a}^m \not\equiv_L \ttt{a}^n$. This means that $S$ is an infinite set of strings that are pairwise distinguishable relative to $L$. Therefore, by the Myhill-Nerode theorem, $L$ is not regular. \qedsymbol
\end{quote}

What's wrong with this proof? Be as specific as possible.

\annotate{The best way to identify a flaw in a proof is to point to a specific claim that's being made that's not true or not properly substantiated and to explain why.}

\begin{shaded}
Write your answer here.
\end{shaded}

\pagebreak

\section{Embracing the Braces}

Let $\Sigma$ be an alphabet containing two characters, the open curly brace character \lcurly and the close curly brace character \rcurly. Consider the following language over $\Sigma$:
\begin{equation*}
L_1 = \{w \in \Sigma^* \mid w\text{ is a string of balanced curly braces}\}
\end{equation*}
For example, we have $\lcurly\rcurly \in L_1$, $\lcurly\lcurly\rcurly\rcurly \in L_1$, $\lcurly\lcurly\rcurly\lcurly\rcurly\rcurly\lcurly\rcurly \in L_1$, $\varepsilon \in L_1$, and $\lcurly\lcurly\rcurly\rcurly\lcurly\lcurly\lcurly\rcurly\lcurly\rcurly\rcurly\rcurly \in L_1$, but $\rcurly\lcurly\ \not\in L_1$,
$\lcurly\lcurly\rcurly \not\in L_1$, and $\lcurly\lcurly\lcurly\rcurly\rcurly\rcurly\rcurly \not\in L_1$. This question explores properties of this language.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Prove that $L_1$ is not a regular language. One consequence of this result -- which you don't need to prove -- is that most languages that support some sort of nested parentheses, such as most programming languages and HTML, aren't regular and so can't be parsed using regular expressions.

    \annotate{As with all problems involving nonregular languages, proceed with this one in stages. First, ask yourself: if you were reading an input string from left to right, what information would you have to keep track of? The Myhill-Nerode theorem asks you to find an infinite set of strings that are all pairwise distinguishable, so try creating an infinite set of strings, one for each possible value that this information could take on.}
    
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
\end{enumerate}

Let's say that the \emph{nesting depth} of a string of balanced parentheses is the maximum number of unmatched open parentheses at any point inside the string. For example, the string \lcurly\lcurly\rcurly\lcurly\lcurly\rcurly\rcurly\rcurly\ has nesting depth three, the string \lcurly\lcurly\rcurly\lcurly\rcurly\rcurly\lcurly\rcurly\ has nesting depth two, and the string $\varepsilon$ has nesting depth zero. 

Consider the language $L_2 = \{w \in \Sigma^* \mid w$ is a string of balanced curly braces with nesting depth at most 4\}. For example, $\lcurly\rcurly \in L_2$, $\lcurly\lcurly\rcurly\lcurly\rcurly\rcurly \in L_2$, and $\lcurly\lcurly\lcurly\lcurly\rcurly\rcurly\rcurly\rcurly\lcurly\lcurly\rcurly\rcurly \in L_2$, but $\lcurly\lcurly\lcurly\lcurly\lcurly\rcurly\rcurly\rcurly\rcurly\rcurly \notin L_2$ because although it's a string of balanced curly braces, the nesting goes five levels deep.

\begin{enumerate}[resume*]
    \item Look back at your proof from part (i) of this problem. Imagine that you were to take that exact proof and blindly replace every instance of ``$L_1$'' with ``$L_2$.'' This would give you a (incorrect) proof that $L_2$ is nonregular (which we know has to be wrong because $L_2$ is indeed regular--if you need convincing, think about how you might design a DFA for it). Where would the error be in that proof? Be as specific as possible.
    
    \annotate{Again, you should be able to point at a specific spot in the proof that contains a logic error and explain exactly why the statement in question is not true or not supported by the preceding statements. If you can't do this, it likely means you have an error in your proof from part (i)!}
    
    \begin{shaded}
    Provide an answer here. 
    \end{shaded}
    
\end{enumerate}

\pagebreak 

\section{State Lower Bounds}

The Myhill-Nerode theorem we proved in lecture is actually a special case of a more general theorem about regular languages that can be used to prove lower bounds on the number of states necessary to construct a DFA for a given language.

\begin{enumerate}[label*=\roman*.,ref=\roman*]
    \item Let $L$ be a language over $\Sigma$. Suppose there's a set $S$, which may be finite and which may be infinite, such that any two distinct strings $x, y \in S$ are distinguishable relative to $L$ (that is, $x \not\equiv_L y$ for any two strings $x, y \in S$ where $x \ne y$.) Prove that any DFA for $L$ must have at least $|S|$ states. (You sometimes hear this referred to as \textit{lower-bounding} the size of any DFA for $L$.)
    
    \begin{shaded}
    Provide a proof here. 
    \end{shaded}
\end{enumerate}

According to old-school Twitter rules, all tweets need to be 140 characters or less. Let $\Sigma$ be the alphabet of characters that can legally appear in a tweet and consider the following language:
\begin{equation*}
\textit{TWEETS} = \{w \in \Sigma^* \mid |w| \leq 140\}.
\end{equation*}
This is the language of all legal tweets, assuming the empty string is a legal tweet. The good news is that this language is regular. The bad news is that any DFA for it has to be pretty large.

\begin{enumerate}[resume*]
\item Using your result from part (i), prove that any DFA for \textit{TWEETS} must have at least 142 states.

\annotate{It might be easier to tackle this problem if you consider replacing 140 and 142 with some smaller numbers (say, 2 and 4) to build up an intuition. And work backwards -- what will you need to do to invoke part (i)?}

\begin{shaded}
Provide a proof here. 
\end{shaded}

\item Define a 142-state DFA for \textit{TWEETS} using the formal 5-tuple definition of a DFA. Briefly explain how your DFA works. No formal proof is necessary.

\annotate{Again, this might be a lot easier to do if you first reduce 140 and 142 to 2 and 4, respectively, and see what you come up with. Start by drawing out what the DFA would look like, then think about how you'd formalize your idea as a 5-tuple.\\
Remember to define the transition function you'll want to use our methods of formally defining a function that we learned earlier in the course; in particular look at piece-wise functions that you made for PSet 4 Question 1.}

\begin{shaded}
Provide an answer here. 
\end{shaded}
\end{enumerate}

Your results from parts (ii) and (iii) collectively establish that the smallest possible DFA for \textit{TWEETS} has exactly 142 states. This approach to finding the smallest object of some type -- using some theorem to prove a lower bound (``we need at least this many states'') combined with a specific object of the given type (``we certainly can't do worse than this'') is a common strategy in algorithm design and computational complexity theory. If you take classes like CS161, CS254, etc., you'll likely see similar sorts of approaches!

\pagebreak


\section{Regular Languages and Equivalence Relations}

Throughout this problem set you've been working with the idea that we can take a language $L$ over some alphabet $Î£$, then work with its distinguishability relation $\not\equiv_L$. A closely related binary relation is the \emph{indistinguishability} relation for L, denoted $\equiv_L$. It's also a binary relation over $\Sigma^*$, and its definition is the negation of the one for distinguishability:
$$x \equiv_L y \text{ if } \forall w \in \Sigma^*. (xw \in L \leftrightarrow yw \in L).$$
Amazingly, this is always an equivalence relation, regardless of what $L$ is!

\begin{enumerate}[label=\roman*.]
    \item Prove that if $L$ is a language over $\Sigma$, then $\equiv_L$ is an equivalence relation over $\Sigma^*$.

    \begin{shaded}
    Provide a proof here.
    \end{shaded}
\end{enumerate}

Whenever you see an equivalence relation, you should immediately start thinking about what its equivalence classes are. Doing so will usually tell you something interesting.

Let's make this more concrete. Let $\Sigma = \{a, b\}$ and consider the language $M = \{w \in \Sigma^* \mid$ the number of \ttt{b}'s in $w$ is congruent to 1 modulo 5 or to 3 modulo 5\}. For example, $\ttt{a\match{b}a} \in M, \ttt{\match{b}aaa\match{b}aaa\match{b}} \in M$, and $\ttt{\match{bbbbbb}} \in M$, but $\ttt{aa} \notin M$ and $\ttt{a\match{bb}a}\notin M$.

\begin{enumerate}[resume*]
    \item How many equivalence classes does the $\equiv_M$ relation have? Briefly describe what those equivalence classes are and give a system of representatives for $\equiv_M$.
    
    \annotate{Need a refresher on systems of representatives? Check out Problem Set Three.}

    \begin{shaded}
    Write your answer here.
    \end{shaded}
\end{enumerate}

You might have noticed that each equivalence class of $\equiv_M$ either consists of a bunch of strings not in $M$ or of a bunch of strings that are in $M$. That's not a coincidence!

\begin{enumerate}[resume*]
    \item Let L be a language over some alphabet $\Sigma$ and let $x \in \Sigma^*$ be some string. Prove that either \textit{every string} in $[x]_{\equiv_L}$ is in $L$ or that \textit{no strings} in $[x]_{\equiv_L}$ are.
    
    \begin{shaded}
    Provide a proof here.
    \end{shaded}
\end{enumerate}

The number of equivalence classes of an equivalence relation is called its \emph{index}; the index of an equivalence relation $R$ is denoted \emph{I(R)}. This quantity might be finite, or it might be an infinite cardinality like $\aleph_0$, or even one of the infinities bigger than that.

Armed with the idea of an index, we can state a powerful theorem about finite automata:
\begin{center}
    \emph{Theorem:} If $L$ is a language over $\Sigma$, then every DFA for $L$ must have at least $I(\equiv_L)$ states.
\end{center}
In other words, there's a connection between the number of equivalence classes of a particular binary relation and the minimum sizes of DFAs for that language!

\begin{enumerate}[resume*]
    \item Prove the above theorem. Feel free to use the \emph{axiom of choice}, which says that every equivalence relation has at least one system of representatives.
    
    \annotate{Proving this theorem is mostly an exercise in connecting together ideas you've seen used in other places. Think about the relationship between indices and systems of representatives, between distinguishability and indistinguishability, and between what you're doing here and what you've done earlier on this problem set.}
    
    \begin{shaded}
    Provide a proof here.
    \end{shaded}
\end{enumerate}

There's a very nice intuition for what this theorem says. You can think of the indistinguishability relation for a language $L$ as pinning down the idea ``a DFA for $L$ can't tell the difference between these two strings.'' If you think back to our intuition behind DFA design -- build a DFA where each state keeps track of some different piece of information -- then you can think of $I(\equiv_L)$ as capturing the number of different pieces of information you'd need to remember. The theorem then says that if you want to build a DFA for a language $L$, you'll need at least one state per piece of information.


\end{document}