%
% LaTeX Problem Set Template by Sachin Padmanabhan
% I created this when I was a freshman in CS 103,
% and I continue to use it to this day.
%
% Hope you enjoy!
%
% There may be problems with this template.
% If so, feel free to contact me.
%
\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{mathdots}
\usepackage[pdftex]{graphicx}
\usepackage{fancyhdr}
\usepackage[margin=1in]{geometry}
\usepackage{multicol}
\usepackage{bm}
\usepackage{listings}
\PassOptionsToPackage{usenames,dvipsnames}{color} %% Allow color names
\usepackage{pdfpages}
\usepackage{algpseudocode}
\usepackage{tikz}
\usetikzlibrary{automata,positioning}
\usepackage{enumitem}
\usepackage[T1]{fontenc}
\usepackage{inconsolata}
\usepackage{framed}
\usepackage{wasysym}
\usepackage[thinlines]{easytable}
\usepackage{wrapfig}
\usepackage{hyperref}
\usepackage{mathrsfs}
\usepackage{tabularx} % also loads 'array' package
\newcolumntype{C}{>{\centering\arraybackslash}X} % centered version of 'X' columns
\usepackage{diagbox}
\title{CS 103: Mathematical Foundations of Computing\\Problem Set \#7}
\author{Sachin Padmanabhan}
\date{\today}
\lhead{Sachin Padmanabhan}
\chead{Problem Set \#7}
\rhead{\today}
\lfoot{}
\cfoot{CS 103: Mathematical Foundations of Computing --- Fall 2017}
\rfoot{\thepage}
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\absfit}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\eval}[3]{\left[#1\right]_{#2}^{#3}}
\renewcommand{\(}{\left(}
\renewcommand{\)}{\right)}
\newcommand{\floor}[1]{\left\lfloor#1\right\rfloor}
\newcommand{\ceil}[1]{\left\lceil#1\right\rceil}
\newcommand{\pd}[1]{\frac{\partial}{\partial #1}}
\newcommand{\inner}[1]{\langle#1\rangle}
\newcommand{\cond}{\bigg|}
\newcommand{\rank}[1]{\mathbf{rank}(#1)}
\newcommand{\range}[1]{\mathbf{range}(#1)}
\newcommand{\nullsp}[1]{\mathbf{null}(#1)}
\newcommand{\repr}[1]{\left\langle#1\right\rangle}
\newcommand\VRule[1][\arrayrulewidth]{\vrule width #1}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator{\Tr}{\mathbf{Tr}}
\DeclareMathOperator{\diag}{\mathbf{diag}}
\DeclareMathOperator{\dist}{\mathbf{dist}}
\DeclareMathOperator{\prob}{\mathbf{prob}}
\DeclareMathOperator{\dom}{\mathbf{dom}}
\DeclareMathOperator{\E}{\mathbf{E}}
\DeclareMathOperator{\Q}{\mathbb{Q}}
\DeclareMathOperator{\R}{\mathbb{R}}
\DeclareMathOperator{\var}{\mathbf{var}}
\DeclareMathOperator{\quartile}{\mathbf{quartile}}
\DeclareMathOperator{\conv}{\mathbf{conv}}
\DeclareMathOperator{\VC}{VC}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\Ber}{Bernoulli}
\DeclareMathOperator{\NP}{\mathbf{NP}}
\DeclareMathOperator{\coNP}{\mathbf{coNP}}
\DeclareMathOperator{\TIME}{\mathsf{TIME}}
\DeclareMathOperator{\polytime}{\mathbf{P}}
\DeclareMathOperator{\PH}{\mathbf{PH}}
\DeclareMathOperator{\SIZE}{\mathbf{SIZE}}
\DeclareMathOperator{\ATIME}{\mathbf{ATIME}}
\DeclareMathOperator{\SPACE}{\mathbf{SPACE}}
\DeclareMathOperator{\ASPACE}{\mathbf{ASPACE}}
\DeclareMathOperator{\NSPACE}{\mathbf{NSPACE}}
\DeclareMathOperator{\Z}{\mathbb{Z}}
\DeclareMathOperator{\N}{\mathbb{N}}
\DeclareMathOperator{\EXP}{\mathbf{EXP}}
\DeclareMathOperator{\NEXP}{\mathbf{NEXP}}
\DeclareMathOperator{\NTIME}{\mathbf{NTIME}}
\DeclareMathOperator{\DTIME}{\mathbf{DTIME}}
\DeclareMathOperator{\poly}{poly}
\DeclareMathOperator{\BPP}{\mathbf{BPP}}
\DeclareMathOperator{\ZPP}{\mathbf{ZPP}}
\DeclareMathOperator{\RP}{\mathbf{RP}}
\DeclareMathOperator{\coRP}{\mathbf{coRP}}
\DeclareMathOperator{\BPL}{\mathbf{BPL}}
\DeclareMathOperator{\IP}{\mathbf{IP}}
\DeclareMathOperator{\PSPACE}{\mathbf{PSPACE}}
\DeclareMathOperator{\NPSPACE}{\mathbf{NPSPACE}}
\DeclareMathOperator{\SAT}{\mathsf{SAT}}
\DeclareMathOperator{\NL}{\mathbf{NL}}
\DeclareMathOperator{\PCP}{\mathbf{PCP}}
\DeclareMathOperator{\PP}{\mathbf{PP}}
\DeclareMathOperator{\cost}{cost}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbf{Pr}}
\definecolor{shadecolor}{gray}{0.9}
\theoremstyle{plain}
\newtheorem*{lem}{Lemma}
\theoremstyle{plain}
\newtheorem*{claim}{Claim}
\theoremstyle{definition}
\newtheorem*{answer}{Answer}
\newtheorem{theorem}{Theorem}[section]
\newtheorem*{thm}{Theorem}
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}
\setlength{\parindent}{0pt}
\pagestyle{fancy}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\begin{document}
\maketitle{\vspace{-1.0cm}}
\section{Designing Regular Expressions}
Below are a list of alphabets and languages over those alphabets. For each language, write a regular expression
for that language. \\
\textit{\textbf{Please use our online tool to design, test, and submit your regular expressions. Typed or handwritten
solutions will not be accepted.}} To use it, visit the CS103 website and click the ``Regex Editor'' link under
the ``Resources'' header. As before, if you submit in a pair, please make a note in your GradeScope submission
of which partner submitted your answers to this question so that we know where to look. Also, as
a reminder, please test your submissions thoroughly, since we'll be grading them with an autograder.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w\text{ does not contain }ba\text{ as a substring }\}$. Write a regular expression for $L$.
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w\text{ does not contain }bb\text{ as a substring }\}$. Write a regular expression for $L$.
\item Suppose you are taking a walk with your dog on a leash of length two. Let $\Sigma = \{y, d\}$ and let
$L = \{\ w \in \Sigma^*\ |\ w$ represents a walk with your dog on a leash where you and your dog both end up
at the same location $\}$. For example, the string $yyddddyy$ is in $L$ because you and your dog are
never more than two steps apart and both of you end up four steps ahead of where you started;
similarly, $ddydyy \in L$. However, $yyyyddd \not\in L$, since halfway through your walk you are three
steps ahead of your dog; $ddyd \not\in L$, because your dog ends up two steps ahead of you; and the
string $ddyddyyy \not\in L$, because at one point in your walk your dog is three steps ahead of you.
Write a regular expression for $L$.
\item Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w \ne ab\ \}$. Write a regular expression for $L$.
\item Let $\Sigma = \{\textsc{M, D, C, L, X, V, I}\}$ and let $L = \{\ w \in \Sigma^*\ |\ w$ is number less than 2,000 represented in Roman numerals $\}$. For example, $\textsc{CMXCIX} \in L$, since it represents the number 999, as are the strings
L (50), VIII (8), DCLXVI (666), CXXXVII (137), CDXII (412), and MDCXVIII (1,618). However,
we have $\textsc{VIIII} \not\in L$ (you'll never have four I's in a row; use IX or IV instead), that $\textsc{MMI} \not\in L$ (it's a
Roman numeral, but it's for 2,001, which is too large), that $\textsc{VX} \not\in L$ (this isn't a valid Roman numeral),
and that $\textsc{IM} \not\in L$ (the notation of using a smaller digit to subtract from a larger one only lets
you use I to prefix V and X, or X to prefix L and C, or C to prefix D and M). The Romans didn't have
a way of expressing the number 0, so to make your life easier we'll say that $\varepsilon \in L$ and that the
empty string represents 0. (Oh, those silly Romans.) Write a regular expression for L. \\
(As a note, we're using the ``standard form'' of Roman numerals. You can see a sample of numbers
written out this way via \href{http://literacy.kent.edu/Minigrants/Cinci/romanchart.htm}{\underline{this link}}.)
\end{enumerate}
\begin{shaded}
Write the SUNetID of the person who submitted the regular expressions here.
\end{shaded}
\section{Finite and Cofinite Languages}
A language $L$ is called \textit{finite} if $L$ contains finitely many strings. More precisely, a language $L$ is a finite
language if $|L|$ is a natural number. A language $L$ is called \textit{cofinite} if its complement is a finite language;
that is, $L$ is cofinite if $|\overline{L}|$ is a natural number.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove that any finite language is regular.
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Prove that any cofinite language is regular.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
\newpage
\section{State Elimination}
The state elimination algorithm gives a way to transform a finite automaton (DFA or NFA) into a regular
expression. It's a really beautiful algorithm once you get the hang of it, so we thought that we'd let you try
it out on a particular example. \\
Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ w$ has an even number of $a$'s and an even number of $b$'s $\}$. Below is a
finite automaton for $L$ that we've prepared for the state elimination algorithm by adding in a new start
state $q_{start}$ and a new accept state $q_{acc}$: \\
\begin{tikzpicture}[shorten >=1pt,node distance=2.2cm,on grid,auto, state/.style={circle, draw, minimum size=1.2cm}]
% \node[state, options?] (identifier) [placement] {display label};
\node[state, initial] (s) {$q_{start}$};
\node[state] (0) [right=of s] {$q_0$};
\node[state] (1) [right=of 0] {$q_1$};
\node[state, accepting] (a) [above=of 0] {$q_{acc}$};
\node[state] (2) [below=of 0] {$q_2$};
\node[state] (3) [below=of 1] {$q_3$};
% (start identifier) edge [options?] node {display label} (target identifier)
% edge [options?] node {display label} (another target identifier)
\path[->]
(s) edge node {$\varepsilon$} (0)
(0) edge [bend left=10] node {$a$} (1)
edge node {$\varepsilon$} (a)
edge [bend left=10] node {$b$} (2)
(1) edge [bend left=10] node {$a$} (0)
edge [bend left=10] node {$b$} (3)
(2) edge [bend left=10] node {$b$} (0)
edge [bend left=10] node {$a$} (3)
(3) edge [bend left=10] node {$b$} (1)
edge [bend left=10] node {$a$} (2);
\end{tikzpicture} \\
We'd like you to use the state elimination algorithm to produce a regular expression for $L$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Run two steps of the state elimination algorithm on the above automaton. Specifically, first remove
state $q_1$, then remove state $q_2$. Show your result at this point.
\begin{shaded}
Provide the resulting automaton here.
% You can copy and modify the above automaton drawn using the tikz automata package,
% or use a finite automata editor such as http://madebyevan.com/fsm/
% If you choose to draw and then scan/take a picture, *please* make sure that the image is clearly legible in Gradescope!
\end{shaded}
\item Finish the state elimination algorithm. What regular expression do you get for $L$?
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Without making reference to the original automaton given above, give an intuitive explanation for
how the regular expression you found in part (ii) works.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
\section{Distinguishable Strings}
The Myhill-Nerode theorem is one of the trickier and more nuanced theorems we've covered this quarter.
This question explores what the theorem means and, importantly, what it \textit{doesn't} mean. \\
Let $\Sigma = \{a, b\}$ and let $L = \{\ w \in \Sigma^*\ |\ |w|\text{ is even }\}$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Show that $L$ is a regular language.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Prove that there is a infinite set $S \subseteq \Sigma^*$ where there are infinitely many pairs of distinct strings
$x, y \in S$ such that $x \not\equiv_L y$.
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Prove that there is \textit{no} infinite set $S \subseteq \Sigma^*$ where all pairs of distinct strings $x, y \in S$ satisfy $x \not\equiv_L y$.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
The distinction between parts (ii) and (iii) is important for understanding the Myhill-Nerode theorem. A
language is nonregular not if you can find infinitely many pairs of distinguishable strings, but rather if you
can find infinitely many strings that are all \textit{pairwise} distinguishable. This is a subtle distinction, but it's an
important one!
\section{Balanced Parentheses}
Consider the following language over $\Sigma = \{(, )\}$:
\begin{equation*}
L_1 = \{\ w \in \Sigma^*\ |\ w\text{ is a string of balanced parentheses }\}
\end{equation*}
For example, we have $() \in L_1$, $(()) \in L_1$, $(()())() \in L_1$, $\varepsilon \in L_1$, and $(())((()())) \in L_1$, but $)(\ \not\in L_1$,
$(() \not\in L_1$, and $((()))) \not\in L_1$. This question explores properties of this language.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove that $L_1$ is not a regular language. One consequence of this result - which you don't need to
prove - is that most languages that support some sort of nested parentheses, such as most programming
languages and HTML, aren't regular and so can't be parsed using regular expressions.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
Let's say that the \textit{nesting depth} of a string of balanced parentheses is the maximum number of unmatched
open parentheses at any point inside the string. For example, the string $((()))$ has nesting depth three,
the string $(()())()$ has nesting depth two, and the string $\varepsilon$ has nesting depth zero. \\
Consider the language $L_2 = \{\ w \in \Sigma^*\ |\ w$ is a string of balanced parentheses and $w$'s nesting depth is at
most four $\}$. For example, $((())) \in L_2$, $(()()) \in L_2$, and $(((())))(()) \in L_2$, but $((((())))) \not\in L_2$ because
although it's a string of balanced parentheses, the nesting goes five levels deep.
\begin{enumerate}[resume*]
\item Design a DFA for $L_2$, showing that $L_2$ is regular. A consequence of \textit{this} result is that while you
can't parse all programs or HTML with regular expressions, you can parse programs with low
nesting depth or HTML documents without deeply-nested tags using regexes. \textit{\textbf{Please submit this
DFA using the DFA editor on the course website}} and tell us on GradeScope who submitted it.
\begin{shaded}
Write the SUNetID of the person who submitted the DFA here.
\end{shaded}
\item Look back at your proof from part (i) of this problem. Imagine that you were to take that exact
proof and blindly replace every instance of ``$L_1$'' with ``$L_2$.'' This would give you a (incorrect) proof
that $L_2$ is nonregular (which we know has to be wrong because $L_2$ is indeed regular.) Where would
the error be in that proof? Be as specific as possible.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Intuitively, regular languages correspond to problems that can be solved using only finite memory.
Using this intuition and without making reference to DFAs, NFAs, or the Myhill-Nerode theorem,
explain why $L_1$ is nonregular while $L_2$ is regular.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
\section{Tautonyms}
A \textit{\textbf{tautonym}} is a word that consists of the same string repeated twice. For example, ``caracara'' and
``dikdik'' are all tautoynms (the first is a bird, the second the cutest animal you'll ever see), as is ``hotshots''
(people who aren't very fun to be around). Let $\Sigma = \{a, b\}$ and consider the following language:
\begin{equation*}
L = \{\ ww\ |\ w \in \Sigma^*\ \}
\end{equation*}
This is the language of all tautonyms over $\Sigma$. Below is an \textit{\textbf{incorrect}} proof that $L$ is not regular:
\begin{quote}
\begin{proof}
Let $S = \{\ a^n\ |\ n \in \N\ \}$. This set is infinite because it contains one string for each natural
number. We claim that any two strings in $S$ are distinguishable relative to $L$. To see this, consider
any two distinct strings $a^m$ and $a^n$ in the set $S$, where $m \ne n$. Then $a^ma^m \in L$ but $a^na^m \not\in L$,
so $a^m \not\equiv_L a^n$. This means that $S$ is an infinite set of strings that are pairwise distinguishable relative
to $L$. Therefore, by the Myhill-Nerode theorem, $L$ is not regular.
\end{proof}
\end{quote}
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item What's wrong with this proof? Be as specific as possible.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\item Although the above proof is incorrect, the language $L$ isn't regular. Prove this. Please make sure
that your proof doesn't make the same error that you identified above!
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
\section{State Lower Bounds}
The Myhill-Nerode theorem we proved in lecture is actually a special case of a more general theorem
about regular languages that can be used to prove lower bounds on the number of states necessary to construct
a DFA for a given language.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Let $L$ be a language over $\Sigma$. Suppose there's a \textit{finite} set $S$ such that any two distinct strings $x, y \in S$
are distinguishable relative to $L$ (that is, $x \not\equiv_L y$). Prove that any DFA for $L$ must have at least $|S|$
states. (You sometimes hear this referred to as \textit{lower-bounding} the size of any DFA for $L$.)
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{enumerate}
According to old-school Twitter rules, all tweets need to be 140 characters or less. Let $\Sigma$ be the alphabet
of characters that can legally appear in a tweet and consider the following language:
\begin{equation*}
TWEETS = \{\ w \in \Sigma^*\ |\ |w| \le 140\ \}.
\end{equation*}
This is the language of all legal tweets. The good news is that this language is regular. The bad news is
that any DFA for it has to be pretty large.
\begin{enumerate}[resume*]
\item Using your result from part (i), prove that any DFA for $TWEETS$ must have at least 142 states.
\begin{shaded}
Provide a proof here.
\end{shaded}
\item Refer back to the formal, 5-tuple definition of a DFA given in Problem Set Six. Define an 142-
state DFA for $TWEETS$ using the formal 5-tuple definition. Briefly explain how your DFA works.
No formal proof is necessary.
\begin{shaded}
\begin{answer}
Provide an answer here.
\end{answer}
\end{shaded}
\end{enumerate}
Your results from parts (ii) and (iii) collectively establish that the smallest possible DFA for $TWEETS$ has
exactly 142 states. This approach to finding the smallest object of some type - using some theorem to
prove a lower bound (``we need at least this many states'') combined with a specific object of the given
type (``we certainly can't do worse than this'') is a common strategy in discrete mathematics, algorithm design,
and computational complexity theory. If you take classes like CS161, CS254, etc., you?ll likely see
similar sorts of approaches!
\section{Closure Properties Revisited}
When building up the regular expressions, we explored several closure properties of the regular languages.
This problem explores some of their nuances. \\
The regular languages are closed under complementation: If $L$ is regular, so is $\overline{L}$.
\begin{enumerate}[label*=\roman*.,ref=\roman*]
\item Prove or disprove: the \textit{nonregular} languages are closed under complementation.
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
The regular languages are closed under union: If $L_1$ and $L_2$ are regular, so is $L_1 \cup L_2$.
\begin{enumerate}[resume*]
\item Prove or disprove: the \textit{nonregular} languages are closed under union.
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
We know that the union of any two regular languages is regular. Using induction, we can show that the
union of any finite number of regular languages is also regular. As a result, we say that the regular languages
are closed under \textit{finite union}. \\
An \textit{infinite union} is the union of infinitely many sets. For example, the rational numbers can be expressed
as the infinite union $\{\ \frac{x}{1}\ |\ x \in \Z\ \} \cup \{\ \frac{x}{2}\ |\ x \in \Z\ \} \cup \{\ \frac{x}{3}\ |\ x \in \Z\ \} \cup \ldots$ out to infinity.
\begin{enumerate}[resume*]
\item Prove or disprove: the regular languages are closed under infinite union.
\begin{shaded}
Provide a proof or disproof here.
\end{shaded}
\end{enumerate}
\section{Generalized Fooling Sets (1 Point Extra Credit)}
In Problem Seven, you saw how to use distinguishability to lower-bound the size of DFAs for a particular
language. Unfortunately, distinguishability is not a powerful enough technique to lower-bound the sizes of
NFAs. In fact, it's in general quite hard to bound NFA sizes; there's a \$1,000,000 prize for anyone who
finds a polynomial-time algorithm that, given an arbitrary NFA, converts it to the smallest possible equivalent
NFA! \\
Although it's generally difficult to lower-bound the sizes of NFAs, there are some techniques we can use
to find lower bounds on the sizes of NFAs. Let $L$ be a language over $\Sigma$. A \textit{\textbf{generalized fooling set}} for $L$ is
a set $\mathscr{F} \subseteq \Sigma^* \times \Sigma^*$ is a set with the following properties:
\begin{itemize}
\item For any $(x, y) \in \mathscr{F}$, we have $xy \in L$.
\item For any distinct pairs $(x_1, y_1), (x_2, y_2) \in \mathscr{F}$, we have $x_1y_2 \not\in L$ or $x_2y_1 \not\in L$ (this is an inclusive OR.)
\end{itemize}
Prove that if $L$ is a language and there is a generalized fooling set $\mathscr{F}$ for $L$ that contains $n$ pairs of strings,
then any NFA for $L$ must have at least $n$ states.
\begin{shaded}
Provide a proof here.
\end{shaded}
\end{document}