Problem Set 8

Due Friday, November 19 at 2:30 pm Pacific

Solutions are available!

🔑 Solutions

In this problem set, you’ll transition away from the regular languages to the context-free languages and to the realm of Turing machines. This will be your first foray beyond the limits of what computers can over hope to accomplish, and we hope that you find this as exciting as we do!

As always, please feel free to drop by office hours or ask on Ed if you have any questions. We'd be happy to help out.

Good luck, and have fun!

Starter Files

Here are the starter files for the problems you'll submit electronically:

📦 PS8 Starter Files

Unpack the files somewhere convenient and open up the bundled project.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS8 $\LaTeX$ Template

Problem One: Designing CFGs

For each of the following languages, design a CFG for that language. To do so, download the starter files for Problem Set Eight and extract them somewhere convenient. Write and save your answers in the file res/Grammars.cfgs. Use the provided program to test and explore your CFG, and submit on Gradescope once you’re finished.

As a note – tests for CFGs take longer to run than tests for regular expressions, DFAs, etc. If the test driver is running slowly on your system, set the project to build in Release rather than Debug mode. To do this, go to the left side of the Qt Creator window and click the picture of a monitor with “Debug” written below it. Then, choose the “Release” option. This disables debugging and turns on optimization, which markedly speeds up the tests.

Before starting this problem, read the Guide to CFGs for advice and tips on how to design context-free grammars.

Given $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}}$, write a CFG for the language { $w \in \Sigma^\star \suchthat w$ contains aa as a substring }. For example, the strings aa, baac, and ccaabb are all in the language, but aba is not.

In our lecture on regular expressions, we wrote the following regular expression that matched email addresses:

a⁺(.a⁺)*@a⁺(.a⁺)⁺

Given $\Sigma = \Set{\mathtt{@}, \mathtt{.}, \mathtt{a}}$, write a CFG whose language is the same as the language of this regular expression.

Given $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$, write a CFG for the language $L$ = { $w \in \Sigma^\star \suchthat w$ is not a palindrome }, the language of strings that are not the same when read forwards and backwards. For example, $\mathtt{aab} \in L$ and $\mathtt{baabab} \in L$, but $\mathtt{aba} \notin L$, $\mathtt{bb} \notin L$, and $\varepsilon \notin L$.

Don’t try solving this one by starting with the CFG for palindromes and making modifications to it. In general, there’s no way to mechanically turn a CFG for a language $L$ into a CFG for the language $\overline{L}$, since the context-free languages aren’t closed under complementation. However, the idea of looking at the first and last characters of a given string might still be a good idea.

Let Σ be an alphabet containing these symbols:
\[\emptyset \quad \naturals \quad \{ \quad \} \quad , \quad \cup\]
We can form strings from these symbols which represent sets. Here's some examples:
- $\emptyset$
- $\Set{\Set{\naturals, \emptyset} \cup \Set{\emptyset}}$
- $\Set{\emptyset, \Set{\emptyset, \Set{\emptyset}}}$
- $\Set{\emptyset, \naturals} \cup \naturals \cup \emptyset$
- $\naturals \cup \Set{\naturals, \emptyset}$
- $\Set{\Set{\Set{\Set{\naturals}}}}$
- $\Set{\Set{}}$
- $\Set{\emptyset} \cup \naturals \cup \Set{\naturals}$
- $\Set{}$
- $\naturals$
- $\Set{\emptyset, \emptyset, \emptyset}$
- $\Set{\naturals}$
- $\Set{\emptyset, \Set{}}$
Notice that some of these sets, like $\Set{\emptyset, \emptyset}$, are syntactically valid but redundant, and others like {} are syntactically valid but not the cleanest way of writing things. Here's some examples of strings that don't represent sets or aren't syntactically valid:
- $\varepsilon$
- $\naturals, \emptyset, \Set{\emptyset}$
- $\{\emptyset$
- $\}\emptyset\{$
- $\Set{, \naturals}$
- $\}\} \naturals$
- $\emptyset \Set{\naturals}$
- ${ \naturals \emptyset },$
- $\Set{ \emptyset, \emptyset, \emptyset, }$
- $\{\{\}$
- $\Set{,}$
- $\Set{\naturals, , , \emptyset}$
Write a CFG for the language { $w \in \Sigma^\star \suchthat w$ is a syntactically valid string representing a set }. Please use the letters n, u, and o in place of $\naturals$, $\cup$, and $\emptyset$, respectively.

Fun fact: the starter files for Problem Set One contain a parser that’s designed to take as input a string representing a set and to reconstruct what set that is. The logic we wrote to do that parsing was based on a CFG we wrote for sets and set theory. Take CS143 if you’re curious how to go from a grammar to a parser!

As a hint, as is often the case when writing CFGs, we recommend that you use different nonterminals to represent different components of the string. For example, structure of a comma-separated list of sets is different than the structure of an expression representing a single set.

Problem Two: The Complexity of Addition

This problem explores the following question:

How hard is it to add two numbers?

Suppose that we want to check whether $x + y = z$, where $x$, $y$, and $z$ are all natural numbers. If we want to phrase this as a problem as a question of strings and languages, we will need to find some way to standardize our notation. In this problem, we will be using the unary number system, a number system in which the number $n$ is represented by writing out $n$ 1's. For example, the number $5$ would be written as 11111, the number $7$ as 1111111, the number $12$ as 111111111111, and the number 0 as $\varepsilon$.

Given the alphabet $\Sigma = \Set{\mathtt{1}, \mathtt{+}, \mathtt{=}}$, we can consider strings encoding $x + y = z$ by writing out $x$, $y$, and $z$ in unary. For example:

\[\begin{array}{rcl} 4 + 3 = 7 &\text{ would be encoded as } & \mathtt{1111+111=1111111} \\ 7 + 1 = 8 &\text{ would be encoded as } & \mathtt{1111111+1=11111111} \\ 0 + 1 = 1 &\text{ would be encoded as } & \mathtt{+1=1} \end{array}\]

Consider the following language, which we’ll call $ADD$:

\[\Set{ \mathtt{1}^m\mathtt{+1}^n\mathtt{=1}^{m+n} \suchthat m, n \in \naturals }\]

For example, the strings 111+1=1111 and +1=1 are in the language, but 1+11=11 is not, nor is the string 1+1+1=111.

Prove or disprove: the language $ADD$ defined above is regular.

To prove a language is regular, you could, for example, build a DFA, an NFA, or a regex for it. To prove a language is not regular, use the Myhill-Nerode theorem. You'll need to decide which case is applicable here.

Write a context-free grammar for $ADD$ in res/Grammars.cfgs, showing that $ADD$ is context-free.

You may find it easier to solve this problem if you first build a CFG for this language where you’re allowed to have as many numbers added together as you’d like. Once you have that working, think about how you’d modify it so that you have exactly two numbers added together on the left-hand side of the equation.

Problem Three: Brzozowski’s Theorem

Let's begin with a new definition. If $L$ is a language over $\Sigma$ and $x \in \Sigma^\star$, then the derivative of $L$ with respect to $\Sigma$, denoted $\partial_x L$, is the language defined below:

\[\partial_x L = \Set{w \in \Sigma^\star \suchthat xw \in L }\text.\]

This definition is a bit dense, so let’s start by unpacking it.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$ and let $L = \Set{ \varepsilon, \mathtt{aba}, \mathtt{bab}, \mathtt{aaab}, \mathtt{bbbaaa} }$. Answer each of the following questions; no justification is necessary.
1. What is the longest string in $\partial_\mathtt{a} L$? If two or more strings are tied for having the longest length, list any one of them.
2. What is the longest string $x \in \Sigma^\star$ for which $\abs{\partial_x L} = 1$? If two or more such strings are tied for having the longest length, list any one of them.
3. What is the shortest string $x \in \Sigma^\star$ for which $\abs{\partial_x L} = 0$? If two or more such strings are tied for having the shortest length, list any one of them.

Let $L = \Set{\mathtt{a}^n\mathtt{b}^n \suchthat n \in \naturals }\text.$ Write a CFG for $\partial_{\mathtt{aaa}} L$ in res/Grammars.cfg.

Now, for a remarkable theorem:

Brzozowski’s Theorem: Let $L$ be a language over $\Sigma$ and let $S \subseteq \Sigma^\star$ be a set containing infinitely many strings. If the following statement is true, then $L$ is not regular:
\[\forall x \in S. \forall y \in S. (x \ne y \to \partial_x L \ne \partial_y L)\]

Another way of stating this theorem is that if $L$ is a language with infinitely many different derivatives, then $L$ isn’t regular.

Prove Brzozowski’s Theorem.

Language derivatives have both practical and theoretical applications. Surprisingly, it's possible to build regular expression matchers that work by recursively computing derivatives of regular languages, and recent work has extended this to context-free grammar parsers. And amazingly, the converse of Brzozowski's theorem is also true: if a language $L$ has only finitely many derivatives, then $L$ is regular. Take CS154 to learn more about this!

Problem Four: Programming Turing Machines

The Church-Turing thesis might seem surprising in light of just how simple Turing machines are. Can they really do any computation that could be done by any feasible computing system? That is indeed the case, and one of the best ways to see this is to try your hand at designing some TMs to see how moving around and marking up a tape is all that’s needed to perform arbitrary computations.

The starter files for this problem contain our TM Debugger. This will let you run TMs on input strings, either by single-stepping using the ⏩ button or by running the program using the ⏵ button.

We’ve included a bunch of sample TMs for different languages as part of the starter files so that you can see some strategies for solving problems with TMs. They’re in the res/tm-samples/ directory. You aren’t required to read over them, though you might find it interesting to do so.

Unlike the DFA/NFA editor and like the regex and CFG editors, you will write your answers by opening the relevant files in Qt Creator and editing them with your answers.

As an FYI, the TM debugger allows you to input arbitrary strings as input to TMs and doesn’t validate that your input strings have the correct alphabet. For the purposes of this problem, you can assume that we will only test your TMs on strings with the correct alphabet.

You can use the “Run Tests” button to check your work and submit your answers to Gradescope once you’re ready. As always, submit as many times as you’d like; we’ll only grade the answers you submit. The autograder for this problem caps the number of steps allowed by your TMs so that if you get stuck in a loop the autograder won’t hang. (The cap is fairly generous, we don’t anticipate correct solutions being rejected for taking too many steps.) If you get a “timeout” error, it means that your TM hasn’t finished running before it accepts or rejects the input. The TM Debugger will show step counts for your TM, which might help you figure out what’s going on.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Program a TM that’s a decider for { $w \in \Sigma^\star \suchthat w$ has odd length and its middle character is a }. Save your answer in res/MiddleA.tm.

Let $\Sigma = \Set{\mathtt{a}}$. Program a TM that’s a decider for $\Set{ \mathtt{a}^{2^n} \suchthat n \in \naturals }\text.$ That is, your TM should accept all strings of a’s whose lengths are powers of two and reject all other strings. Save your answer in res/Power2.tm.

There are many ways you can approach this problem. One would be to keep cutting the input string length in half until it has an odd length (if that length is one you accept, and if it’s greater than one you reject). Another would be to start with a string of length one and keep doubling its length until it either perfectly matches the input’s length (accept) or exceeds it (reject). Explore and see what you find!

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{=}}\text.$ Program a TM that’s a decider for the language $L = \Set{ w\mathtt{=}w \suchthat w \in \Set{\mathtt{a}, \mathtt{b}}^\star }\text.$ For example, the strings aaa=aaa, bab=bab, b=b, and = are all in $L$, while the strings aaa=aab, =aaa, aaa=, and a=a=a are all not in $L$. Save your answer is res/Equal.tm.

One step in implementing this TM will be confirming that the input string has the right “shape,” in the sense that you don’t want to be worrying about inputs like ==== or aa=aa=aa.

If you enjoyed this exercise, feel free to write your own TMs for whatever languages most interest you. If you come up with anything interesting or exciting, feel free to email your creations to the staff mailing list!

Problem Five: Executable Computability Theory

In what follows, assume that $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$, meaning that you only need to consider input strings made of a’s and b’s.

Below are three examples of C++ functions that act as recognizers or deciders. For each of these functions, do the following:

Tell us what language the function is recognizer for. Use set-builder notation if appropriate.
Tell us whether the function is a decider for that language in addition to just being a recognizer.
Tell us whether the language you’ve identified is decidable.

No justification is required for any of these questions.

Remember that a language is decidable if there exists a decider $D$ for it – and that decider does not necessarily have to be the same as the function we’re asking you to look at.

Answer the three above questions about this function:

bool pizkwat(string input) {
    if (input.size() % 2 == 1) return false;
    
    /* Indices within strings range from 0 (the first character) to
     * index.size() - 1 (the last character), inclusive.
     */
    for (int i = 0; i < input.size() / 2; i++) {
        if (input[i] != 'a' || input[input.size() - 1 - i] != 'b') {
            return false;
        }
    }
    return true;
}

Answer the three above questions about this function.

bool squigglebah(string input) {
    std::set<string> strings = {""}; // Initialize a set holding the empty string
    while (true) {
        std::set<string> nextStrings; // Initially empty

        for (string x: strings) {
            if (input == x) return true;

            nextStrings.insert(x + "a"); // Add the string xa to nextStrings
            nextStrings.insert(x + "b"); // Add the string xb to nextStrings
        }

         /* On the next iteration through the main while loop, use nextStrings. */
         strings = nextStrings;
    }
}

A reminder for this part of the problem and for all other parts of the problem: you can assume all input strings are strings over $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$.

Answer the three above questions about this function.

bool moozle(string input) {
    int a = 0;
    int b = 1;
    while (input.size() != a) {
        int c = a + b;
        a = b;
        b = c;
    }
    return true;
}

Problem Six: What Does it Mean to Solve a Problem?

Let $L$ be a language over $\Sigma$ and $M$ be a TM with input alphabet $\Sigma$. Here are three potential traits of $M$:

$\forall w \in \Sigma^\star. M \haltson w.$
$\forall w \in \Sigma^\star. (M \accepts w \to w \in L).$
$\forall w \in \Sigma^\star. (M \rejects w \to w \notin L).$

At some level, for a TM to claim to solve a problem, it should have at least some of these properties. Interestingly, though, just having two of these properties doesn't say much.

Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (1) and (2).

This is an existential statement, so to prove it, you will need to give a concrete example of a TM with these properties, or at least show how to construct one.

The whole point of this problem is to show that you have to be extremely careful about how you define “solving a problem,” since if you define it incorrectly then you can “solve” a problem in a way that bears little resemblance to what we’d think of as solving a problem. Keep this in mind as you work through this one.

Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (1) and (3).

Prove that if $L$ is any language over $\Sigma$, then there is a TM $M$ that satisfies properties (2) and (3).

Suppose $L$ is a language over $\Sigma$ for which there is a TM $M$ that satisfies all of properties (1), (2), and (3). What can you say about $L$? Prove it.

In the earlier parts of this problem you showed that just having two of these three properties doesn't tell you much about $L$. As you're writing your proof for this problem, make sure you see exactly where you're citing all three of the relevant properties. If you don't use all three properties, chances are that your proof is incorrect.

Optional Fun Problem: TMs and Regular Languages

Let $M$ be a recognizer for a language $L$ over $\Sigma$. Suppose $M$ has the following property: there exists a natural number $k$ such that when $M$ is run on any string $w$, $M$ always halts after at most $k$ steps.

Prove that $L$ is regular.