Problem Set 6

Due Friday, November 5 at 2:30 pm Pacific

Solutions are available!

🔑 Solutions

This sixth problem set explores formal language theory, finite automata, the regular languages, and their properties. This will be your first foray into computability theory, and I hope you find it fun and exciting!

As always, please feel free to drop by office hours, ask on Ed, or email the staff list if you have any questions. We'd be happy to help out.

Good luck, and have fun!

Starter Files

Here are the starter files for the problems you'll submit electronically:

📦 PS6 Starter Files

Unpack the files somewhere convenient and open up the bundled project.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS6 $\LaTeX$ Template

Problem One: Constructing DFAs

Download the starter files for Problem Set Six and extract them somewhere convenient on your system. Run the program to access the automaton editor, which you’ll use throughout the problem set. The provided program offers this functionality:

Automaton Editor: Like the Graph Editor from PS4, use this tool to design automata.
Automaton Tester: A tool you can use to test your automata on various input strings. Enter strings in the text box on the left, and the tool will show you which of those strings your automata accepts and which of those strings it rejects. You can follow each string with what you expect the automaton to do, and the tool will tell you whenever the expected output doesn’t match the actual output. For example, you might start with these test strings for the automaton in part (i):
- BBIIB yes
- BBB no
Automaton Debugger: A tool for single-stepping through the operation of an automaton on a string. You can use this to see how your automata process individual strings.

There’s also an automated test suite you can use to check your work.

For each of the following languages over the indicated alphabets, construct a DFA that accepts precisely the strings that are in the indicated language. Your DFA does not have to have the fewest number of states possible, though for your own edification we’d recommend trying to construct the smallest DFAs possible. Use the Automaton Editor to compose your answers, and submit your finished automata to Gradescope.

There are many ways to tile a $2 \times 8$ checkerboard with dominoes, two of which are shown here:

Notice that the horizontal dominoes must appear as stacked pairs (do you see why?) We can read each tiling from left to right as a string made from the characters I and B, where I denotes “a vertical domino” and B denotes “two horizontal dominoes.” The top tiling here would be represented as BIBIII and the bottom tiling as IBBBI.

Let $\Sigma$ be the alphabet $\Set{\mathtt{B}, \mathtt{I}}$. Construct a DFA for the language { $w \in \Sigma^* \suchthat w$ represents a domino tiling of a $2 \times 8$ checkerboard }.

You’re taking a walk with your dog along a straight-line path. Your dog is on a leash of length two, so the distance between you and your dog can be at most two units. You and your dog start at the same position. Consider the alphabet $\Sigma = \Set{\mathtt{y}, \mathtt{d}}$. A string in $\Sigma^*$ can be thought of as a series of events in which either you or your dog moves forward one unit. For example, the string yydd means you take two steps forward, then your dog takes two steps forward.

Let $L$ be the language { $w \in \Sigma^* \suchthat w$ describes a series of steps where you and your dog are never more than two units apart }. Construct a DFA for $L$.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Construct a DFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains the same number of instances of the substring ab and the substring ba }. Note that substrings are allowed to overlap, so we have $\mathtt{aba} \in L$ (one copy of each substring) and $\mathtt{babab} \in L$ (two copies of each substring).

Let $\Sigma = \Set{\mathtt{a}, \mathtt{c}, \mathtt{m}, \mathtt{o}}$. Construct a DFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains the word cocoa as a substring }. For example, $\mathtt{mm\underline{cocoa}mm} \in L$ and $\mathtt{\underline{cocoa}} \in L$, but $\mathtt{c} \notin L$, $\mathtt{cmomcmoma} \notin L$ (though cocoa is a subsequence of cmomcmoma, it’s not a substring), and $\varepsilon \notin L$.

Some trickier cases to watch for: $\mathtt{ccc\underline{cocoa}} \in L$, and $\mathtt{co\underline{cocoa}} \in L$.

Problem Two: Constructing NFAs

For each of the following languages over the indicated alphabets, use the Automaton Editor to design an NFA that accepts precisely the strings that are in the indicated language. As before, while you don’t have to design the smallest NFAs possible, we recommend that you try to keep your NFAs small both to make testing easier and for your own edification.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}}$. Construct an NFA for { $w \in \Sigma^* \suchthat w$ ends in a, bb, or ccc }.

While it’s possible to do this completely deterministically, it’s a bit easier if you use the “guess-and-check” framework we talked about in class.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}, \mathtt{d}, \mathtt{e}}$. Construct an NFA for the language $L$ = { $w \in \Sigma^* \suchthat $ the letters in $w$ are sorted alphabetically }. For example, $\mathtt{abcde} \in L$, $\mathtt{bee} \in L$, $\mathtt{a} \in L$ and $\varepsilon \in L$, but $\mathtt{decade} \notin L$.

You could do this deterministically, but that will require a lot of transitions. You can dramatically reduce that number by using ε-transitions strategically.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}, \mathtt{d}, \mathtt{e}}$. Construct an NFA for the language { $w \in \Sigma^* \suchthat$ the last character of $w$ appears nowhere else in $w$, and $\abs{w} \ge 1$ }.

This problem is all about embracing nondeterminism. Use the “guess-and-check” framework. What information would you want to guess? How would you check it? Please don’t try to solve this problem by building a DFA for this language; you’ll need at least 50 states if you tried to approach things this way.

Stuck? Try reducing the alphabet to two or three letters and see if you can solve that version.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Construct an NFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains at least two b's with exactly five characters between them }. For example, $\mathtt{\underline{b}aaaaa\underline{b}} \in L$, $\mathtt{aabaa\underline{b}aaabb\underline{b}} \in L$, and $\mathtt{a\underline{b}bbbba\underline{b}aaaaaaab} \in L$, but $\mathtt{bbbbb} \notin L$, $\mathtt{bbbab} \notin L$, and $\mathtt{aaabab} \notin L$.

The smallest DFA for this language is huge, so please don’t solve this one deterministically. Embrace the nondeterminism! What would you like to guess? How would you check that your guess is right?

Problem Three: $\powerset{\Sigma^*}$

Let $\Sigma$ be an alphabet. Give a short English description of the set $\powerset{\Sigma^*}$.

We think that there is a single “best answer.” You should be able to describe the set in at most ten words.

Problem Four: Much Ado About Nothing, Part II

We’ve covered a bunch of terminology in the past week. This problem is designed to help you make sense of what all these terms mean.

Are there any languages $L$ where $\varepsilon \in L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \notin L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \subseteq L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \not\subseteq L$? If so, give an example of one. If not, explain why not.

Does $\emptyset = \varepsilon$? Briefly explain your answer.

Does $\emptyset = \Set{\varepsilon}$? Briefly explain your answer.

Problem Five: Hard Resets

A hard reset string for a DFA is a string $w$ with the following property: starting from any state in the DFA, if you read $w$, you end up in the DFA's start state.

Hard reset strings have many practical applications. For example, suppose you're remotely controlling a Mars rover whose state you're modeling as a DFA. Imagine there's a hardware glitch that puts the Mars rover into a valid but unknown state. Since you can't physically go to Mars to pick up the rover and fix it, the only way to change the rover's state would be to issue it new commands. To recover from this mishap, you could send the rover a hard reset string. Regardless of what state the rover got into, this procedure would guarantee that it would end up in its initial configuration.

Here is an algorithm that, given any DFA, will let you find every hard reset string for that DFA:

Add a new start state $q_s$ to the automaton with $\varepsilon$-transitions to every state in the DFA.
Perform the subset construction on the resulting NFA to produce a new DFA called the power automaton.
If the power automaton contains a state corresponding solely to the original DFA's start state, make that state the only accepting state in the power automaton. Otherwise, make every state in the power automaton a rejecting state.

This process produces a new automaton that accepts all the hard reset strings of the original DFA. It's possible that a DFA won't have any hard reset strings (for example, if it contains a dead state), in which case the new DFA won't accept anything.

Apply the above algorithm to the following DFA and give us a hard reset string for that DFA. For simplicity, please give the subset-constructed DFA as a transition table rather than a state-transition diagram. We've given you space for the table over to the right, and to be nice, we've given you exactly the number of rows you'll need.

\[\begin{array}{|c|c|c|} \hline \text{state} & \mathtt{a} & \mathtt{b} \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline \end{array}\]

Sample hard reset string: $\blank$.

Problem Six: Induction on Strings

Why does induction work? One perspective that will be useful here is the following: if $n$ is a natural number, then either

$n = 0$, or
there is some $k \in \naturals$ where $n = k + 1$.

The base case of a proof by induction handles this first case, and the inductive step of the proof handles the second case. Collectively, this covers all natural numbers.

There's a similar rule that works for strings. If $w \in \Sigma^*$ is a string over $\Sigma$, then either

$w = \varepsilon$, or
there is some $x \in \Sigma^*$ and $a \in \Sigma$ where $w = xa$.

(Can you see how $\varepsilon$ kinda sorta ish plays the role of zero? Can you see why that second step kinda sorta ish corresponds to adding one?) This inductive definition of strings allows us to write recursive functions that process strings.

Let $\Sigma = \Set{\mathtt{r}, \mathtt{s}}$ and consider the function $\rho : \Sigma^* \to \Sigma^*$ defined as follows:

\[\rho(w) = \begin{cases} \varepsilon & \text{if } w = \varepsilon \\ a\rho(x) & \text{if } w = xa \text{ for some } x \in \Sigma^* \text{ and } a \in \Sigma \end{cases}\]

To make sure the notation here is clear, the expression $a\rho(x)$ means "the concatenation of $a$ and $\rho(x)$."

Fill in the following blanks. No justification is required.
1. $\rho(\mathtt{r}) = $ $\blank$.
2. $\rho(\mathtt{sr}) = $ $\blank$.
3. $\rho(\mathtt{rrs}) = $ $\blank$.
4. $\rho(\mathtt{r}^{100}\mathtt{s}^{100}) = $ $\blank$.

Because this function on strings is defined recursively, to formally reason about this function we’ll use a type of induction that works on strings. We begin by defining a predicate $P(w)$ that takes a string $w$ as an input rather than a natural number $n$. We then prove two statements:

Our base case: $P(\varepsilon)$.
Our inductive step: $\forall x \in \Sigma^*. \forall a \in \Sigma. (P(x) \to P(xa))$.

(Do you see how the base case in string induction is kinda sorta ish like proving $P(0)$ in regular induction? Do you see how the inductive step in string induction is kinda sorta ish like assuming $P(k)$ and proving $P(k+1)$ in regular induction? You don’t need to submit your answers to these questions, but you should make sure you’re rock solid on them before moving on.)

Fill in the blanks below to complete the following proof that $\abs{\rho(w)} = \abs{w}$ for all strings $w$. As a reminder, the notation $\abs{w}$ denotes the length of the string $w$.

Theorem: For any string $w \in \Sigma^\star$, we have $\abs{\rho(w)} = \abs{w}$.

Proof: Let $P(w)$ be the statement "$\blank$." We wlil prove by induction that $P(w)$ holds for all $w \in \Sigma^\star$, from which the theorem follows.

As a base case, we prove $P(\blank)$, namely, that $\blank$. $\blank$.

For our inductive step, pick some $x \in \Sigma^*$ and $a \in \Sigma$ and assume $P(\blank)$ is true, meaning that $\blank$. We need to show $P(\blank)$ is true, that $\blank$. To see this, note that
\[\begin{aligned} \abs{\rho(\blank)} &= \abs{\blank} \\ &= \blank + 1 \\ &= \blank + 1 && \text{(by our IH)} \\ &= \blank \text. \\ \end{aligned}\]
This means that $\blank$, so $P(\blank)$ holds, completing the induction. $\qed$

As usual, the relative sizes of the blanks do not necessarily indicate how much you need to write.

Now, we’d like you to prove the following result:

Theorem: For all strings $w, z \in \Sigma^*$, we have $\rho(zw) = \rho(w)\rho(z)$.

This result involves two different strings $w$ and $z$, so we’ll use a standard induction technique. Let’s choose this predicate $P(w)$:

$P(w)$ is the statement “for any $z \in \Sigma^*$, we have $\rho(zw) = \rho(w)\rho(z)$.”

In other words, the predicate takes a single argument $w$, then internally has a universal quantifier on $z$.

Briefly explain why if $P(w)$ is true for all $w \in \Sigma^*$, then the theorem is true.

Prove the theorem using a proof by induction with the specific choice of predicate $P$ given above.

The style of induction here is closely related to a powerful technique called structural induction that can also be used to prove results about trees, computer programs, and more. Take CS258 for more details!

Problem Seven: DFAs, Formally

We've drawn DFAs both as state-transition diagrams (with circles for states and arrows for transitions) and as tables with rows for states and columns for characters. But what exactly is a DFA, in a mathematical sense? Formally speaking, a DFA is a 5-tuple $(Q, \Sigma, \delta, q_0, F)\text,$ where

$Q$ is a finite set, the elements of which we call states;
$Σ$ is a finite, nonempty set, the elements of which we call characters;
$\delta : Q \times Σ \to Q$ is the transition function, described below ($\delta$ is a lower-case Greek delta);
$q_0 \in Q$ is the start state; and
$F \subseteq Q$ is the set of accepting states.

When drawing DFAs, we've represented transitions either by arrows labeled with characters or as a table with rows and columns corresponding to states and symbols, respectively. In this formal definition, the transition function $\delta$ is what ultimately specifies the transition. Specifically, for any state $q \in Q$ and any symbol $\mathtt{a} \in \Sigma$, the transition from state $q$ on symbol $a$ is given by $\delta(q, a)$.

Consider the following 5-tuple definition of a DFA:
$D = (Q, \Sigma, \delta, q_0, F)$, where
- $Q = \Set{q_0, q_1, q_2, q_3}$,
- $\Sigma = \Set{\mathtt{r}, \mathtt{s}}$,
- $\delta$ is defined as follows:
  \[\delta(q_k, a) = \left\lbrace \begin{array}{cl} \\ q_{k+1} & \text{if } a = \mathtt{r} \text{ and } k \ne 3 \\ q_k & \text{if } a = \mathtt{r} \text{ and } k = 3 \\ q_0 & \text{otherwise} \end{array} \right.\]
- $F = \Set{q_0, q_1, q_2}$
Use the DFA editor to draw this exact DFA.

Piece this apart one step at a time. How many states does this DFA have? Which one is the start state? What is the alphabet, and how many characters are in it? Which states are accepting/?

The trickiest part is decoding the transitions. Begin with the start state. Pick any character you want. Where do you go when you see that character? See if you can take things from there.

Once you’re done, a good question to ponder but not submit: what is the language of this DFA?

Consider the following 5-tuple definition of a DFA:
$D = (Q, \Sigma, \delta, q_0, F)$, where
- $Q = \Set{q_0, q_1, q_2}$,
- $\Sigma = \Set{\mathtt{r}, \mathtt{s}}$,
- $\delta$ is defined as follows:
  \[\delta(q_k, a) = \left\lbrace \begin{array}{cl} \\ q_{\frac{3k^2 - 7k + 4}{2}} & \text{if } a = \mathtt{r} \\ q_{\frac{-3k^2 + 5k + 2}{2}} & \text{if } a = \mathtt{s} \end{array} \right.\]
- $F = \Set{q_0}$.
Use the DFA editor to draw this exact DFA.

Again, we recommend thinking over what the language of this DFA is once you’ve drawn it out, but it's not required.

Problem Eight: Concatenation, Kleene Stars, and Complements

This problem is all about transformations on languages and the effects they have.

Prove or disprove: if $L$ is a nonempty, finite language and $k$ is a positive natural number, then $\abs{L}^k = \abs{L^k}$. Here, the notation $\abs{L}^k$ represents “the cardinality of $L$, raised to the $k$th power,” and the notation $\abs{L^k}$ represents “the cardinality of the $k$-fold concatenation of $L$ with itself.”

This is a "prove or disprove" problem, which means that your first task is to figure out whether this statement is true (in which case you'll prove it) or false (in which case you'll disprove it). To disprove a result, you'll write a proof of its negation. Typically, a disproof begins with wording like "We will prove the negation of the claim, specifically, $\blank$" and then proceeds as if it were a regular proof of the negation. Check the Proofwriting Checklist for more details. (But don't take this guidance as a subtle hint that you're supposed to disprove this statement - we're leaving that up for you to decide.)

Since you have to first decide whether this statement or its negation is true, write out both statements and explore each branch to see what you find.

Prove or disprove: there is a language $L$ where $\overline{\left(L^\star\right)} = \left(\overline{L}\right)^\star$.

A good warm-up problem: what is $\emptyset^\star$? Don’t answer this question based off of your intuition; look back at the formal definition of the Kleene star.

To prove the statement $S = T$ where $S$ and $T$ are sets, prove that $S \subseteq T$ and that $T \subseteq S$. To disprove the statement $S = T$, prove that there is some $x$ where either $x \in S$ and $x \notin T$ or $x \in T$ and $x \notin S$.

Optional Fun Problem: Why Finite?

A deterministic infinite automaton, or DIA, is a generalization of a DFA in which the automaton has infinitely many different states. Formally speaking, a DIA is given by the same 5-tuple definition as a DFA, except that $Q$ must be an infinite set. Since DIAs have infinitely many states, they’re mostly an object of purely theoretical study. You couldn’t actually build one in the real world.

Prove that if $L$ is an arbitrary language over some alphabet $\Sigma$, then there is a DIA whose language is $L$. To do so, show how to start with a language $L$, formally define a 5-tuple corresponding to a DIA for $L$, then explain why that DIA accepts all the strings in $L$ and nothing else.