%
% LaTeX Problem Set Template by Sachin Padmanabhan
% I created this when I was a freshman in CS 103,
% and I continue to use it to this day.
%
% Hope you enjoy!
%
% There may be problems with this template.
% If so, feel free to contact me.
%
\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{mathdots}
\usepackage[pdftex]{graphicx}
\usepackage{fancyhdr}
\usepackage[margin=1in]{geometry}
\usepackage{multicol}
\usepackage{bm}
\usepackage{listings}
\PassOptionsToPackage{usenames,dvipsnames}{color} %% Allow color names
\usepackage{pdfpages}
\usepackage{algpseudocode}
\usepackage{tikz}
\usetikzlibrary{automata,positioning}
\usepackage{enumitem}
\usepackage[T1]{fontenc}
\usepackage{inconsolata}
\usepackage{framed}
\usepackage{wasysym}
\usepackage[thinlines]{easytable}
\usepackage{wrapfig}
\usepackage{hyperref}
\usepackage{mathrsfs}
\usepackage{tabularx} % also loads 'array' package
\newcolumntype{C}{>{\centering\arraybackslash}X} % centered version of 'X' columns
\usepackage{diagbox}
\title{CS 103: Mathematical Foundations of Computing\\Problem Set \#7}
\author{Sachin Padmanabhan}
\date{\today}
\lhead{Sachin Padmanabhan}
\chead{Problem Set \#7}
\rhead{\today}
\lfoot{}
\cfoot{CS 103: Mathematical Foundations of Computing --- Winter 2018}
\rfoot{\thepage}
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\absfit}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\eval}[3]{\left[#1\right]_{#2}^{#3}}
\renewcommand{\(}{\left(}
\renewcommand{\)}{\right)}
\newcommand{\floor}[1]{\left\lfloor#1\right\rfloor}
\newcommand{\ceil}[1]{\left\lceil#1\right\rceil}
\newcommand{\pd}[1]{\frac{\partial}{\partial #1}}
\newcommand{\inner}[1]{\langle#1\rangle}
\newcommand{\cond}{\bigg|}
\newcommand{\rank}[1]{\mathbf{rank}(#1)}
\newcommand{\range}[1]{\mathbf{range}(#1)}
\newcommand{\nullsp}[1]{\mathbf{null}(#1)}
\newcommand{\repr}[1]{\left\langle#1\right\rangle}
\newcommand\VRule[1][\arrayrulewidth]{\vrule width #1}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator{\Tr}{\mathbf{Tr}}
\DeclareMathOperator{\diag}{\mathbf{diag}}
\DeclareMathOperator{\dist}{\mathbf{dist}}
\DeclareMathOperator{\prob}{\mathbf{prob}}
\DeclareMathOperator{\dom}{\mathbf{dom}}
\DeclareMathOperator{\E}{\mathbf{E}}
\DeclareMathOperator{\Q}{\mathbb{Q}}
\DeclareMathOperator{\R}{\mathbb{R}}
\DeclareMathOperator{\var}{\mathbf{var}}
\DeclareMathOperator{\quartile}{\mathbf{quartile}}
\DeclareMathOperator{\conv}{\mathbf{conv}}
\DeclareMathOperator{\VC}{VC}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\Ber}{Bernoulli}
\DeclareMathOperator{\NP}{\mathbf{NP}}
\DeclareMathOperator{\coNP}{\mathbf{coNP}}
\DeclareMathOperator{\TIME}{\mathsf{TIME}}
\DeclareMathOperator{\polytime}{\mathbf{P}}
\DeclareMathOperator{\PH}{\mathbf{PH}}
\DeclareMathOperator{\SIZE}{\mathbf{SIZE}}
\DeclareMathOperator{\ATIME}{\mathbf{ATIME}}
\DeclareMathOperator{\SPACE}{\mathbf{SPACE}}
\DeclareMathOperator{\ASPACE}{\mathbf{ASPACE}}
\DeclareMathOperator{\NSPACE}{\mathbf{NSPACE}}
\DeclareMathOperator{\Z}{\mathbb{Z}}
\DeclareMathOperator{\N}{\mathbb{N}}
\DeclareMathOperator{\EXP}{\mathbf{EXP}}
\DeclareMathOperator{\NEXP}{\mathbf{NEXP}}
\DeclareMathOperator{\NTIME}{\mathbf{NTIME}}
\DeclareMathOperator{\DTIME}{\mathbf{DTIME}}
\DeclareMathOperator{\poly}{poly}
\DeclareMathOperator{\BPP}{\mathbf{BPP}}
\DeclareMathOperator{\ZPP}{\mathbf{ZPP}}
\DeclareMathOperator{\RP}{\mathbf{RP}}
\DeclareMathOperator{\coRP}{\mathbf{coRP}}
\DeclareMathOperator{\BPL}{\mathbf{BPL}}
\DeclareMathOperator{\IP}{\mathbf{IP}}
\DeclareMathOperator{\PSPACE}{\mathbf{PSPACE}}
\DeclareMathOperator{\NPSPACE}{\mathbf{NPSPACE}}
\DeclareMathOperator{\SAT}{\mathsf{SAT}}
\DeclareMathOperator{\NL}{\mathbf{NL}}
\DeclareMathOperator{\PCP}{\mathbf{PCP}}
\DeclareMathOperator{\PP}{\mathbf{PP}}
\DeclareMathOperator{\cost}{cost}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbf{Pr}}
\definecolor{shadecolor}{gray}{0.9}
\theoremstyle{plain}
\newtheorem*{lem}{Lemma}
\theoremstyle{plain}
\newtheorem*{claim}{Claim}
\theoremstyle{definition}
\newtheorem*{answer}{Answer}
\newtheorem{theorem}{Theorem}[section]
\newtheorem*{thm}{Theorem}
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}
\setlength{\parindent}{0pt}
\pagestyle{fancy}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\begin{document}
\maketitle{\vspace{-1.0cm}}
\section{Designing Regular Expressions}
Below are a list of alphabets and languages over those alphabets. For each language, write a regular expression
for that language. \\
\textit{\textbf{Please use our online tool to design, test, and submit your regular expressions. Typed or handwritten
solutions will not be accepted.}} To use it, visit the CS103 website and click the ``Regex Editor'' link under
the ``Resources'' header. As before, if you submit in a pair, please make a note in your GradeScope submission
of which partner submitted your answers to this question so that we know where to look. Also, as
a reminder, please test your submissions thoroughly, since we'll be grading them with an autograder.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w\text{ does not contain }ba\text{ as a substring }\}$. Write a regular expression for $L$.
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w\text{ does not contain }bb\text{ as a substring }\}$. Write a regular expression for $L$.
\textit{\textcolor{blue}{ You may want to draw out some sample strings in this language and look for a pattern. }}
\item Suppose you are taking a walk with your dog on a leash of length two. Let $\Sigma = \{y, d\}$ and let
$L = \{\ w \in \Sigma^*\ |\ w$ represents a walk with your dog on a leash where you and your dog both end up
at the same location $\}$. For example, the string $yyddddyy \in L$ because you and your dog are
never more than two steps apart and both of you end up four steps ahead of where you started;
similarly, $ddydyy \in L$. However, $yyyyddd \not\in L$, since halfway through your walk you are three
steps ahead of your dog; $ddyd \not\in L$, because your dog ends up two steps ahead of you; and
$ddyddyyy \not\in L$, because at one point in your walk your dog is three steps ahead of you.
Write a regular expression for $L$.
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w \ne ab\ \}$. Write a regular expression for $L$.
\item Let $\Sigma = \{\textsc{M, D, C, L, X, V, I}\}$ and let $L = \{\ w \in \Sigma^*\ |\ w$ is number less than 2,000 represented in Roman numerals $\}$. For example, $\textsc{CMXCIX} \in L$, since it represents the number 999, as are the strings
L (50), VIII (8), DCLXVI (666), CXXXVII (137), CDXII (412), and MDCXVIII (1,618). However,
we have $\textsc{VIIII} \not\in L$ (you'll never have four I's in a row; use IX or IV instead), that $\textsc{MM} \not\in L$ (it's a
Roman numeral, but it's for 2,000, which is too large), that $\textsc{VX} \not\in L$ (this isn't a valid Roman numeral),
and that $\textsc{IM} \not\in L$ (the notation of using a smaller digit to subtract from a larger one only lets
you use I to prefix V and X, or X to prefix L and C, or C to prefix D and M). The Romans didn't have
a way of expressing the number 0, so to make your life easier we'll say that $\varepsilon \in L$ and that the
empty string represents 0. (Oh, those silly Romans.) Write a regular expression for L. \\
(As a note, we're using the ``standard form'' of Roman numerals. You can see a sample of numbers
written out this way via \href{http://literacy.kent.edu/Minigrants/Cinci/romanchart.htm}{\underline{this link}}.)
\end{enumerate}
\begin{shaded}
Write the SUNetID of the person who submitted the regular expressions here.
\end{shaded}
\section{Finite and Cofinite Languages}
A language $L$ is called \textbf{\textit{finite}} if $L$ contains finitely many strings (that is, $|L|$ is a natural number.) A language $L$ is called \textbf{\textit{cofinite}} if its complement is a finite language;
that is, $L$ is cofinite if $|\overline{L}|$ is a natural number.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Given a finite language $L$, explain how to write a regular expression for $L$. Briefly justify your answer;
no formal proof is necessary. This shows that all finite languages are regular.
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Look up the \textit{\textbf{trie}} data structure on Wikipedia. Explain, in your own words, how a trie can be
thought of as a finite automaton. This gives another view on why finite languages are regular.
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Prove that any cofinite language is regular.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
\newpage
\section{State Elimination}
The state elimination algorithm gives a way to transform a finite automaton (DFA or NFA) into a regular
expression. It's a really beautiful algorithm once you get the hang of it, so we thought that we'd let you try
it out on a particular example. \\
Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w$ has an even number of $a$'s and an even number of $b$'s $\}$. Below is a
finite automaton for $L$ that we've prepared for the state elimination algorithm by adding in a new start
state $q_{start}$ and a new accept state $q_{acc}$: \\
\begin{tikzpicture}[shorten >=1pt,node distance=2.2cm,on grid,auto, state/.style={circle, draw, minimum size=1.2cm}]
% \node[state, options?] (identifier) [placement] {display label};
\node[state, initial] (s) {$q_{start}$};
\node[state] (0) [right=of s] {$q_0$};
\node[state] (1) [right=of 0] {$q_1$};
\node[state, accepting] (a) [above=of 0] {$q_{acc}$};
\node[state] (2) [below=of 0] {$q_2$};
\node[state] (3) [below=of 1] {$q_3$};
% (start identifier) edge [options?] node {display label} (target identifier)
% edge [options?] node {display label} (another target identifier)
\path[->]
(s) edge node {$\varepsilon$} (0)
(0) edge [bend left=10] node {$a$} (1)
edge node {$\varepsilon$} (a)
edge [bend left=10] node {$b$} (2)
(1) edge [bend left=10] node {$a$} (0)
edge [bend left=10] node {$b$} (3)
(2) edge [bend left=10] node {$b$} (0)
edge [bend left=10] node {$a$} (3)
(3) edge [bend left=10] node {$b$} (1)
edge [bend left=10] node {$a$} (2);
\end{tikzpicture} \\
We'd like you to use the state elimination algorithm to produce a regular expression for $L$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Run two steps of the state elimination algorithm on the above automaton. Specifically, first remove
state $q_1$, then remove state $q_2$. Show your result at this point.
\textit{\textcolor{blue}{ Go slowly. Remember that to eliminate a state q, you should identify all pairs of states $q_{in}$ and $q_{out}$ where there's a transition from $q_{in}$ to q and from q to $q_{out}$, then add shortcut edges from $q_{in}$ to $q_{out}$ to bypass state q.
Remember that $q_{in}$ and $q_{out}$ can be the same state. If you've done everything right at the end of this stage,
none of the transitions you have at this point should have Kleene stars in them. }}
\begin{shaded}
Provide the resulting automaton here.
% You can copy and modify the above automaton drawn using the tikz automata package,
% or use a finite automata editor such as http://madebyevan.com/fsm/
% If you choose to draw and then scan/take a picture, *please* make sure that the image is clearly legible in Gradescope!
\end{shaded}
\item Finish the state elimination algorithm. What regular expression do you get for $L$?
\textit{\textcolor{blue}{ You should start seeing Kleene stars appearing as you remove the remaining states. }}
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Without making reference to the original automaton given above, give an intuitive explanation for
how the regular expression you found in part (ii) works.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
\section{Distinguishable Strings}
The Myhill-Nerode theorem is one of the trickier and more nuanced theorems we've covered this quarter.
This question explores what the theorem means and, importantly, what it \textit{doesn't} mean. \\
Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ |w|\text{ is even }\}$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Show that $L$ is a regular language.
\textit{\textcolor{blue}{ There are lots of ways you can go about doing this. Pick whatever is easiest. }}
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Prove that there is a infinite set $S \subseteq \Sigma^*$ where there are infinitely many pairs of distinct strings
$x, y \in S$ such that $x \not\equiv_L y$.
\textit{\textcolor{blue}{ An analogy that might help: do you understand the difference between ``a group of six people, any two of
which are friends'' versus ``a group of six people, of which six pairs of those people are friends?'' }}
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Prove that there is \textit{no} infinite set $S \subseteq \Sigma^*$ where \textit{all} pairs of distinct strings $x, y \in S$ satisfy $x \not\equiv_L y$.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
The distinction between parts (ii) and (iii) is important for the Myhill-Nerode theorem. A
language is nonregular not if you can find infinitely many pairs of distinguishable strings, but rather if you
can find infinitely many strings that are all \textit{pairwise} distinguishable.
\section{At Least Three Point Five}
The Myhill-Nerode theorem is a powerful tool for finding nonregular languages, but it can take some adjusting
to get used to. \\
Let $\Sigma = \{a, b\}$ and consider the following language:
\begin{equation*}
L = \{\ w \in \Sigma^*\ |\ \text{there are at least as many }a\text{'s as }b\text{'s in }w\ \}.
\end{equation*}
Below is a purported proof that $L$ is not a regular language.
\begin{quote}
\begin{thm}
$L$ is not a regular language.
\end{thm}
\begin{proof}
Consider the set $S = \{ a^n\ |\ n \in \N \}$. This set is infinite because it contains one string for each
natural number. We will prove that any two distinct strings in $S$ are distinguishable relative to $L$.
To do so, consider any distinct strings $a^m, a^n \in S$, and assume without loss of generality that $m > n$.
Then $b^ma^m \in L$ because this string contains the same number of $a$'s and $b$'s, but $b^ma^n \not\in L$
because it contains $m$ $b$'s and $n$ $a$'s and $m > n$. Therefore, we see that $a^m \not\equiv_L a^n$.
This means that $S$ is an infinite set of strings that are pairwise distinguishable relative to $L$.
Therefore, by the Myhill-Nerode theorem, $L$ is not regular.
\end{proof}
\end{quote}
Unfortunately, this proof is incorrect.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item What's wrong with this proof? Be as specific as possible.
\textit{\textcolor{blue}{ The best way to identify a flaw in a proof is to point to a specific claim that's being made that's not true or
not properly substantiated and to explain why. }}
\begin{shaded}
Provide an answer here.
\end{shaded}
\item Is $L$ a regular language? If so, show why it's regular. If not, prove that $L$ is not a regular language.
\textit{\textcolor{blue}{ Think about the intuition behind what makes a language regular or not regular. If your intuition tells you
that this language is regular, then use that intuition to guide your justification. If your intuition tells you that
this language is not regular, use that intuition to guide your disproof. And if you don't have an intuition,
take a look back at the lecture slides and feel free to stop by office hours! }}
\begin{shaded}
Provide an answer here.
\end{shaded}
\end{enumerate}
\section{Balanced Parentheses}
Consider the following language over $\Sigma = \{(, )\}$:
\begin{equation*}
L_1 = \{\ w \in \Sigma^*\ |\ w\text{ is a string of balanced parentheses }\}
\end{equation*}
For example, we have $() \in L_1$, $(()) \in L_1$, $(()())() \in L_1$, $\varepsilon \in L_1$, and $(())((()())) \in L_1$, but $)(\ \not\in L_1$,
$(() \not\in L_1$, and $((()))) \not\in L_1$. This question explores properties of this language.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove that $L_1$ is not a regular language. One consequence of this result -- which you don't need to
prove -- is that most languages that support some sort of nested parentheses, such as most programming
languages and HTML, aren't regular and so can't be parsed using regular expressions.
\textit{\textcolor{blue}{ As with all problems involving nonregular languages, proceed with this one in stages. First, ask yourself: if
you were reading an input string from left to right, what information would you necessarily have to keep
track of? The Myhill-Nerode theorem asks you to find an infinite set of strings that are all pairwise distinguishable,
so try creating an infinite set of strings, one for each possible value that this information could
take on. }}
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
Let's say that the \textit{nesting depth} of a string of balanced parentheses is the maximum number of unmatched
open parentheses at any point inside the string. For example, the string $((()))$ has nesting depth three,
the string $(()())()$ has nesting depth two, and the string $\varepsilon$ has nesting depth zero. \\
Consider the language $L_2 = \{\ w \in \Sigma^*\ |\ w$ is a string of balanced parentheses and $w$'s nesting depth is at
most four $\}$. For example, $((())) \in L_2$, $(()()) \in L_2$, and $(((())))(()) \in L_2$, but $((((())))) \not\in L_2$ because
although it's a string of balanced parentheses, the nesting goes five levels deep.
\begin{enumerate}[resume*]
\item Design a DFA for $L_2$, showing that $L_2$ is regular. A consequence of \textit{this} result is that while you
can't parse all programs or HTML with regular expressions, you can parse programs with low
nesting depth or HTML documents without deeply-nested tags using regexes. \textit{\textbf{Please submit this
DFA using the DFA editor on the course website}} and tell us on GradeScope who submitted it.
\begin{shaded}
Write the SUNetID of the person who submitted the DFA here.
\end{shaded}
\item Look back at your proof from part (i) of this problem. Imagine that you were to take that exact
proof and blindly replace every instance of ``$L_1$'' with ``$L_2$.'' This would give you a (incorrect) proof
that $L_2$ is nonregular (which we know has to be wrong because $L_2$ is indeed regular.) Where would
the error be in that proof? Be as specific as possible.
\textit{\textcolor{blue}{ Again, you should be able to point at a specific spot in the proof that contains a logic error and explain exactly why the statement in question is not true or not supported by the preceding statements. If you can't do
this, it likely means you have an error in your proof from part (i)! }}
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Intuitively, regular languages correspond to problems that can be solved using only finite memory.
Using this intuition and without making reference to DFAs, NFAs, or the Myhill-Nerode theorem,
explain why $L_1$ is nonregular while $L_2$ is regular.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
\section{Tautonyms}
A \textit{\textbf{tautonym}} is a word that consists of the same string repeated twice. For example, ``caracara'' and
``dikdik'' are all tautoynms (the first is a bird, the second the cutest animal you'll ever see), as is ``hotshots''
(people who aren't very fun to be around). Let $\Sigma = \{a, b\}$ and consider the following language:
\begin{equation*}
L = \{\ ww\ |\ w \in \Sigma^*\ \}
\end{equation*}
This is the language of all tautonyms over $\Sigma$. Below is an \textit{\textbf{incorrect}} proof that $L$ is not regular:
\begin{quote}
\begin{proof}
Let $S = \{\ a^n\ |\ n \in \N\ \}$. This set is infinite because it contains one string for each natural
number. We claim that any two strings in $S$ are distinguishable relative to $L$. To see this, consider
any two distinct strings $a^m$ and $a^n$ in the set $S$, where $m \ne n$. Then $a^ma^m \in L$ but $a^na^m \not\in L$,
so $a^m \not\equiv_L a^n$. This means that $S$ is an infinite set of strings that are pairwise distinguishable relative
to $L$. Therefore, by the Myhill-Nerode theorem, $L$ is not regular.
\end{proof}
\end{quote}
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item What's wrong with this proof? Be as specific as possible.
\textit{\textcolor{blue}{ This one is subtle, and it's not the same error that you identified earlier. As before, point out a specific claim
that is made that isn't true or isn't supported by the preceding statements and justify why. }}
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Although the above proof is incorrect, the language $L$ isn't regular. Prove this.
\textit{\textcolor{blue}{ The most common mistake we see people make in part (ii) of this problem is to make the exact same sort of
mistake that they identified in part (i) of this problem (oops!) So go slowly here and double-check your
proof to make sure it's airtight, especially in light of what you saw in part (i). You may want to take a sentence
or two to precisely articulate why each claim that you're making is true, since as you saw above it's
easy for an erroneous statement to slip in under the radar if you're not careful! }}
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
\section{State Lower Bounds}
The Myhill-Nerode theorem we proved in lecture is actually a special case of a more general theorem
about regular languages that can be used to prove lower bounds on the number of states necessary to construct
a DFA for a given language.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Let $L$ be a language over $\Sigma$. Suppose there's a \textit{finite} set $S$ such that any two distinct strings $x, y \in S$
are distinguishable relative to $L$ (that is, $x \not\equiv_L y$ for any two strings $x, y \in S$ where $x \ne y$.) Prove that any DFA for $L$ must have at least $|S|$ states. (You sometimes hear this referred to as \textit{lower-bounding} the size of any DFA for $L$.)
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
According to old-school Twitter rules, all tweets need to be 140 characters or less. Let $\Sigma$ be the alphabet
of characters that can legally appear in a tweet and consider the following language:
\begin{equation*}
TWEETS = \{\ w \in \Sigma^*\ |\ |w| \le 140\ \}.
\end{equation*}
This is the language of all legal tweets. The good news is that this language is regular, assuming the empty string is a legal tweet. The bad news is that any DFA for it has to be pretty large.
\begin{enumerate}[resume*]
\item Using your result from part (i), prove that any DFA for $TWEETS$ must have at least 142 states.
\textit{\textcolor{blue}{ It might be easier to tackle this problem if you consider replacing 140 and 142 with some smaller numbers
(say, 2 and 4) to build up an intuition. And work backwards -- what will you need to do to invoke part (i)? }}
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Refer back to the formal, 5-tuple definition of a DFA given in Problem Set Six. Define an 142-
state DFA for $TWEETS$ using the formal 5-tuple definition. Briefly explain how your DFA works.
No formal proof is necessary.
\textit{\textcolor{blue}{ Again, this might be a lot easier to do if you first reduce 140 and 142 to 2 and 4, respectively, and see what
you come up with. Start by drawing out what the DFA would look like, then think about how you'd formalize
your idea as a 5-tuple. }}
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
Your results from parts (ii) and (iii) collectively establish that the smallest possible DFA for $TWEETS$ has
exactly 142 states. This approach to finding the smallest object of some type - using some theorem to
prove a lower bound (``we need at least this many states'') combined with a specific object of the given
type (``we certainly can't do worse than this'') is a common strategy in discrete mathematics, algorithm design,
and computational complexity theory. If you take classes like CS161, CS254, etc., you?ll likely see
similar sorts of approaches!
\section{Closure Properties Revisited}
When building up the regular expressions, we explored several closure properties of the regular languages.
This problem explores some of their nuances. \\
The regular languages are closed under complementation: If $L$ is regular, so is $\overline{L}$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove or disprove: the \textit{nonregular} languages are closed under complementation.
\textit{\textcolor{blue}{ In other words, if you have a nonregular language L, is $\overline{L}$ necessarily regular? }}
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
The regular languages are closed under union: If $L_1$ and $L_2$ are regular, so is $L_1 \cup L_2$.
\begin{enumerate}[resume*]
\item Prove or disprove: the \textit{nonregular} languages are closed under union.
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
We know that the union of any two regular languages is regular. Using induction, we can show that the
union of any finite number of regular languages is also regular. As a result, we say that the regular languages
are closed under \textit{\textbf{finite union}}. \\
An infinite union is the union of infinitely many sets. Formally speaking, imagine that you have a sequence
of infinitely many sets $S_0, S_1, S_2, \ldots$, one for each natural number. Then the \textit{\textbf{infinite union}} of those
sets is defined as follows:
\begin{equation*}
\bigcup\limits_{i=0}^{\infty} S_i = \{\ w\ |\ \exists n \in \N.\ w \in S_n\ \}.
\end{equation*}
For example, if $L$ is a language, then $L^*$, which we defined as $\{\ w \in \Sigma^*\ |\ \exists n \in \N.\ w \in L^n\ \}$, can be thought of as the infinite union
\begin{equation*}
\bigcup\limits_{i=0}^{\infty} L^i.
\end{equation*}
Similarly, the set of all rational numbers, $\Q$, can be thought of as
\begin{equation*}
\Q = \bigcup\limits_{i=0}^{\infty} \{\ \frac{x}{i+1}\ |\ x \in \Z\ \}.
\end{equation*}
\begin{enumerate}[resume*]
\item Prove or disprove: the regular languages are closed under infinite union.
\textit{\textcolor{blue}{ In other words, if you have an infinite list of regular languages $L_0, L_1, L_2, \ldots$, is their infinite union necessarily a regular language? }}
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
\section{Generalized Fooling Sets (1 Point Extra Credit)}
In Problem Eight, you saw how to use distinguishability to lower-bound the size of DFAs for a particular
language. Unfortunately, distinguishability is not a powerful enough technique to lower-bound the sizes of
NFAs. In fact, it's in general quite hard to bound NFA sizes; there's a \$1,000,000 prize for anyone who
finds a polynomial-time algorithm that, given an arbitrary NFA, converts it to the smallest possible equivalent
NFA! \\
Although it's generally difficult to lower-bound the sizes of NFAs, there are some techniques we can use
to find lower bounds on the sizes of NFAs. Let $L$ be a language over $\Sigma$. A \textit{\textbf{generalized fooling set}} for $L$ is
a set $\mathscr{F} \subseteq \Sigma^* \times \Sigma^*$ is a set with the following properties:
\begin{itemize}
\item For any $(x, y) \in \mathscr{F}$, we have $xy \in L$.
\item For any distinct pairs $(x_1, y_1), (x_2, y_2) \in \mathscr{F}$, we have $x_1y_2 \not\in L$ or $x_2y_1 \not\in L$ (this is an inclusive OR.)
\end{itemize}
Prove that if $L$ is a language and there is a generalized fooling set $\mathscr{F}$ for $L$ that contains $n$ pairs of strings,
then any NFA for $L$ must have at least $n$ states.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{document}