Problem Set 6

Due Friday, November 11 at 2:30 pm Pacific

Solutions are available!

🔑 Solutions

This sixth problem set explores formal language theory, finite automata, the regular languages, and their properties. This will be your first foray into computability theory, and I hope you find it fun and exciting!

As always, please feel free to drop by office hours, ask on Ed, or email the staff list if you have any questions. We'd be happy to help out.

Good luck, and have fun!

Starter Files

Here are the starter files for the problems you'll submit electronically:

📦 PS6 Starter Files

Unpack the files somewhere convenient and open up the bundled project.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS6 $\LaTeX$ Template

Problem One: Constructing DFAs

Download the starter files for Problem Set Six and extract them somewhere convenient on your system. Run the program to access the automaton editor, which you’ll use throughout the problem set. The provided program offers this functionality:

Automaton Editor: Like the Graph Editor from PS4, use this tool to design automata.
Automaton Tester: A tool you can use to test your automata on various input strings. Enter strings in the text box on the left, and the tool will show you which of those strings your automata accepts and which of those strings it rejects. You can follow each string with what you expect the automaton to do, and the tool will tell you whenever the expected output doesn’t match the actual output. For example, you might start with these test strings for the automaton in part (i):
- BBIIB yes
- BBB no
Automaton Debugger: A tool for single-stepping through the operation of an automaton on a string. You can use this to see how your automata process individual strings.

There’s also an automated test suite you can use to check your work.

For each of the following languages over the indicated alphabets, construct a DFA that accepts precisely the strings that are in the indicated language. Your DFA does not have to have the fewest number of states possible, though for your own edification we’d recommend trying to construct the smallest DFAs possible. Use the Automaton Editor to compose your answers, and submit your finished automata to Gradescope.

There are many ways to tile a $2 \times 8$ checkerboard with dominoes, two of which are shown here:

Notice that the horizontal dominoes must appear as stacked pairs (do you see why?) We can read each tiling from left to right as a string made from the characters I and B, where I denotes “a vertical domino” and B denotes “two horizontal dominoes.” The top tiling here would be represented as BIBIII and the bottom tiling as IBBBI.

Let $\Sigma$ be the alphabet $\Set{\mathtt{B}, \mathtt{I}}$. Construct a DFA for the language { $w \in \Sigma^* \suchthat w$ represents a domino tiling of a $2 \times 8$ checkerboard }.

You’re taking a walk with your dog along a straight-line path. Your dog is on a leash of length two, so the distance between you and your dog can be at most two units. You and your dog start at the same position. Consider the alphabet $\Sigma = \Set{\mathtt{y}, \mathtt{d}}$. A string in $\Sigma^*$ can be thought of as a series of events in which either you or your dog moves forward one unit. For example, the string yydd means you take two steps forward, then your dog takes two steps forward.

Let $L$ be the language { $w \in \Sigma^* \suchthat w$ describes a series of steps where you and your dog are never more than two units apart }. Construct a DFA for $L$.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Construct a DFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains the same number of instances of the substring ab and the substring ba }. Note that substrings are allowed to overlap, so we have $\mathtt{aba} \in L$ (one copy of each substring) and $\mathtt{babab} \in L$ (two copies of each substring).

Let $\Sigma = \Set{\mathtt{a}, \mathtt{c}, \mathtt{m}, \mathtt{o}}$. Construct a DFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains the word cocoa as a substring }. For example, $\mathtt{mm\underline{cocoa}mm} \in L$ and $\mathtt{\underline{cocoa}} \in L$, but $\mathtt{c} \notin L$, $\mathtt{cmomcmoma} \notin L$ (though cocoa is a subsequence of cmomcmoma, it’s not a substring), and $\varepsilon \notin L$.

Some trickier cases to watch for: $\mathtt{ccc\underline{cocoa}} \in L$, and $\mathtt{co\underline{cocoa}} \in L$.

Problem Two: Constructing NFAs

For each of the following languages over the indicated alphabets, use the Automaton Editor to design an NFA that accepts precisely the strings that are in the indicated language. As before, while you don’t have to design the smallest NFAs possible, we recommend that you try to keep your NFAs small both to make testing easier and for your own edification.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}}$. Construct an NFA for { $w \in \Sigma^* \suchthat w$ ends in a, bb, or ccc }.

While it’s possible to do this completely deterministically, it’s a bit easier if you use the “guess-and-check” framework we talked about in class.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}, \mathtt{d}, \mathtt{e}}$. Construct an NFA for the language $L$ = { $w \in \Sigma^* \suchthat $ the letters in $w$ are sorted alphabetically }. For example, $\mathtt{abcde} \in L$, $\mathtt{bee} \in L$, $\mathtt{a} \in L$ and $\varepsilon \in L$, but $\mathtt{decade} \notin L$.

You could do this deterministically, but that will require a lot of transitions. You can dramatically reduce that number by using ε-transitions strategically.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}, \mathtt{d}, \mathtt{e}}$. Construct an NFA for the language { $w \in \Sigma^* \suchthat$ the last character of $w$ appears nowhere else in $w$, and $\abs{w} \ge 1$ }.

This problem is all about embracing nondeterminism. Use the “guess-and-check” framework. What information would you want to guess? How would you check it? Please don’t try to solve this problem by building a DFA for this language; you’ll need at least 50 states if you tried to approach things this way.

Stuck? Try reducing the alphabet to two or three letters and see if you can solve that version.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Construct an NFA for the language $L$ = { $w \in \Sigma^* \suchthat w$ contains at least two b's with exactly five characters between them }. For example, $\mathtt{\underline{b}aaaaa\underline{b}} \in L$, $\mathtt{aabaa\underline{b}aaabb\underline{b}} \in L$, and $\mathtt{a\underline{b}bbbba\underline{b}aaaaaaab} \in L$, but $\mathtt{bbbbb} \notin L$, $\mathtt{bbbab} \notin L$, and $\mathtt{aaabab} \notin L$.

The smallest DFA for this language is huge, so please don’t solve this one deterministically. Embrace the nondeterminism! What would you like to guess? How would you check that your guess is right?

Problem Three: $\powerset{\Sigma^*}$

Let $\Sigma$ be an alphabet. Give a short English description of the set $\powerset{\Sigma^*}$.

We think that there is a single “best answer” that makes appropriate use of the terminology from our lectures on formal languages. You should be able to describe the set in at most ten words.

Problem Four: Much Ado About Nothing, Part II

We’ve covered a bunch of terminology in the past week. This problem is designed to help you make sense of what all these terms mean.

Are there any languages $L$ where $\varepsilon \in L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \notin L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \subseteq L$? If so, give an example of one. If not, explain why not.

Are there any languages $L$ where $\varepsilon \not\subseteq L$? If so, give an example of one. If not, explain why not.

Does $\emptyset = \varepsilon$? Briefly explain your answer.

Does $\emptyset = \Set{\varepsilon}$? Briefly explain your answer.

Problem Five: Hard Resets

A hard reset string for a DFA is a string $w$ with the following property: starting from any state in the DFA, if you read $w$, you end up in the DFA's start state.

Hard reset strings have many practical applications. For example, suppose you're remotely controlling a Mars rover whose state you're modeling as a DFA. Imagine there's a hardware glitch that puts the Mars rover into a valid but unknown state. Since you can't physically go to Mars to pick up the rover and fix it, the only way to change the rover's state would be to issue it new commands. To recover from this mishap, you could send the rover a hard reset string. Regardless of what state the rover got into, this procedure would guarantee that it would end up in its initial configuration.

Here is an algorithm that, given any DFA, will let you find every hard reset string for that DFA:

Add a new start state $q_s$ to the automaton with $\varepsilon$-transitions to every state in the DFA.
Perform the subset construction on the resulting NFA to produce a new DFA called the power automaton.
If the power automaton contains a state corresponding solely to the original DFA's start state, make that state the only accepting state in the power automaton. Otherwise, make every state in the power automaton a rejecting state.

This process produces a new automaton that accepts all the hard reset strings of the original DFA. It's possible that a DFA won't have any hard reset strings (for example, if it contains a dead state), in which case the new DFA won't accept anything.

Apply the above algorithm to the following DFA and give us a hard reset string for that DFA. For simplicity, please give the subset-constructed DFA as a transition table rather than a state-transition diagram. We've given you space for the table over to the right, and to be nice, we've given you exactly the number of rows you'll need.

\[\begin{array}{|c|c|c|} \hline \text{state} & \mathtt{a} & \mathtt{b} \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline & & \\ \hline \end{array}\]

Sample hard reset string: $\blank$.

Problem Six: Monoids and Kleene Stars

There’s a special type of language called a monoid that pops up when exploring formal language theory. Specifically, we define monoids as follows: if $M$ is a language over $\Sigma$, then

\[M \text{ is a monoid} \qquad \text{if} \qquad \varepsilon \in M \text{ and } MM \subseteq M\text.\]

Let’s take a minute to get a handle on what monoids look like.

Give us two examples of monoids, one of which contains finitely many strings and one of which contains infinitely many strings. No justification is required.

Your next task is to prove an important result about powers of languages.

Let $L$ be a language over some alphabet $\Sigma$. Prove, by induction, that $L^nL^m = L^{m+n}$ for all natural numbers $m$ and $n\text.$ To do so, prove that the following predicate $P(n)$ is true for all natural numbers $n$:

$P(n)$ is the statement "for all $m \in \naturals$, we have $L^nL^m = L^{m+n}$."

Your proof should leverage the formal definition of language powers, which is reprinted below:
\[L^0 = \Set{\varepsilon} \qquad L^{n+1} = LL^n\]
Feel free to use these facts, which you can state without proof:
- Language concatenation is associative: $A(BC) = (AB)C\text.$
- The language $\Set{\varepsilon}$ is the identity language for concatenation: $\Set{\varepsilon}L = L\Set{\varepsilon} = L\text.$
While intuitively we think of $L^k$ as “all strings made of $k$ strings from $L$,” the formal definition of $L^k$ is the inductive one given here. Use that formal definition rather than the higher-level intuition.

There is a deep, fundamental connection between Kleene stars and monoids.

Let $L$ be an arbitrary language over $\Sigma$. Prove that $L^\star$ is a monoid over $\Sigma$. In the course of writing this proof, please call back to the formal definition of language concatenation and the Kleene star:
\[L^\star = \Set{w \suchthat \exists n \in \naturals. w \in L^n }\] \[L_1L_2 \Set{ x \suchthat \exists w_1 \in L_1. \exists w_2 \in L_2. x = w_1w_2 }\]

This is a great spot to write out two columns, one for “what I’m assuming” and one for “what I need to show.” Again, don’t assume anything about language powers or concatenation without first proving it.

Remember that to prove $S \subseteq T$, you should pick an arbitrary $x \in S$ and then prove $x \in T$.

Let $L$ be a language over $\Sigma$ and let $M$ be a monoid over $\Sigma$. Prove that if $L \subseteq M$, then $L^\star \subseteq M$.

This problem is all about finding the right way to formalize things. Break the task down into smaller pieces and make sure you organize everything in a way that makes the logical flow easy to read and rigorously covers all cases. Once you have the setup put together, dive in and fill out each section.

Parts (iii) and (iv) of this problem collectively say “$L^\star$ is the language formed by adding the fewest strings to $L$ to make it a monoid.” To see why, note that $L^\star$ is a monoid, and we can’t take any strings out of it and be left with a monoid, since $L^\star$ is a subset of all monoids containing $L$. That’s a very different perspective on the Kleene star than what we started with!

Monoids, in a slightly more general sense than what you've seen here, have applications in parallel computing, functional programming, and theoretical CS. Take CS149 and CS242 for details!

Problem Seven: Concatenation, Kleene Stars, and Complements

This problem explores some subtle but important properties of language transformations.

Prove or disprove: if $L$ is a nonempty, finite language and $k$ is a positive natural number, then $\abs{L}^k = \abs{L^k}$. Here, the notation $\abs{L}^k$ represents “the cardinality of $L$, raised to the $k$th power,” and the notation $\abs{L^k}$ represents “the cardinality of the $k$-fold concatenation of $L$ with itself.”

This is a "prove or disprove" problem, which means that your first task is to figure out whether this statement is true (in which case you'll prove it) or false (in which case you'll disprove it). To disprove a result, you'll write a proof of its negation. Typically, a disproof begins with wording like "We will prove the negation of the claim, specifically, $\blank$" and then proceeds as if it were a regular proof of the negation. Check the Proofwriting Checklist for more details. (But don't take this guidance as a subtle hint that you're supposed to disprove this statement - we're leaving that up for you to decide.)

Since you have to first decide whether this statement or its negation is true, write out both statements and explore each branch to see what you find.

Now, a new definition that we'll make use of later in the quarter. If $L \subseteq \Sigma^\star$ is a language, the complement of $L\text,$ denoted $\overline{L}\text,$ is the language

\[\overline{L} = \Sigma^\star - L\text.\]

Intuitively, $\overline{L}$ consists of all the strings over $\Sigma$ that are not in $L\text.$

Prove or disprove: there is a language $L$ where $\overline{\left(L^\star\right)} = \left(\overline{L}\right)^\star$.

A good warm-up problem: what is $\emptyset^\star$? Don’t answer this question based off of your intuition; look back at the formal definition of the Kleene star.

To prove the statement $S = T$ where $S$ and $T$ are sets, prove that $S \subseteq T$ and that $T \subseteq S$. To disprove the statement $S = T$, prove that there is some $x$ where either $x \in S$ and $x \notin T$ or $x \in T$ and $x \notin S$.

Optional Fun Problem: Doubling Down

Let $L$ be a regular language over $\Sigma$. Prove that the language $L_{(2)} = \Set{ w \in \Sigma^\star \suchthat ww \in L }$ is regular.