Problem Set 8

Due Sunday, November 24 at 1:00 pm Pacific

Solutions are available!

🔑 Solutions

In this eighth problem set, you’ll transition away from the regular languages to the context-free languages and to the realm of Turing machines. This will be your first foray beyond the limits of what computers can over hope to accomplish, and we hope that you find this as exciting as we do!

As always, please feel free to drop by office hours or ask on Ed if you have any questions. We'd be happy to help out.

Good luck, and have fun!

Starter Files

Here are the starter files for the problems you'll submit electronically:

📦 PS8 Starter Files

Unpack the files somewhere convenient and open up the bundled project.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS8 $\LaTeX$ Template

Problem One: Designing CFGs

For each of the following languages, design a CFG for that language. To do so, download the starter files for Problem Set Eight and extract them somewhere convenient. Write and save your answers in the file res/Grammars.cfgs. Use the provided program to test and explore your CFG, and submit on Gradescope once you’re finished.

As a note – tests for CFGs take longer to run than tests for regular expressions, DFAs, etc. If the test driver is running slowly on your system, set the project to build in Release rather than Debug mode. To do this, go to the left side of the Qt Creator window and click the picture of a monitor with “Debug” written below it. Then, choose the “Release” option. This disables debugging and turns on optimization, which markedly speeds up the tests.

Before starting this problem, read the Guide to CFGs for advice and tips on how to design context-free grammars.

Given $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{c}}$, write a CFG for the language { $w \in \Sigma^\star \suchthat w$ contains aa as a substring }. For example, the strings aa, baac, and ccaabb are all in the language, but aba is not.

In our lecture on regular expressions, we wrote the following regular expression that matched email addresses:

a⁺(.a⁺)*@a⁺(.a⁺)⁺

Given $\Sigma = \Set{\mathtt{@}, \mathtt{.}, \mathtt{a}}$, write a CFG whose language is the same as the language of this regular expression.

Given $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$, write a CFG for the language $L$ = { $w \in \Sigma^\star \suchthat w$ is not a palindrome }, the language of strings that are not the same when read forwards and backwards. For example, $\mathtt{aab} \in L$ and $\mathtt{baabab} \in L$, but $\mathtt{aba} \notin L$, $\mathtt{bb} \notin L$, and $\varepsilon \notin L$.

Don’t try solving this one by starting with the CFG for palindromes and making modifications to it. In general, there’s no way to mechanically turn a CFG for a language $L$ into a CFG for the language $\overline{L}$, since the context-free languages aren’t closed under complementation. However, the idea of looking at the first and last characters of a given string might still be a good idea.

Suppose that we want to check whether $x + y = z$, where $x$, $y$, and $z$ are all natural numbers. If we want to phrase this as a problem as a question of strings and languages, we will need to find some way to standardize our notation. In this problem, we will be using the unary number system, a number system in which the number $n$ is represented by writing out $n$ 1's. For example, the number $5$ would be written as 11111, the number $7$ as 1111111, the number $12$ as 111111111111, and the number 0 as $\varepsilon$.

Given the alphabet $\Sigma = \Set{\mathtt{1}, \mathtt{+}, \mathtt{=}}$, we can consider strings encoding $x + y = z$ by writing out $x$, $y$, and $z$ in unary. For example:
\[\begin{array}{rcl} 4 + 3 = 7 &\text{ would be encoded as } & \mathtt{1111+111=1111111} \\ 7 + 1 = 8 &\text{ would be encoded as } & \mathtt{1111111+1=11111111} \\ 0 + 1 = 1 &\text{ would be encoded as } & \mathtt{+1=1} \end{array}\]
Consider the following language, which we’ll call $ADD$:
\[\Set{ \mathtt{1}^m\mathtt{+1}^n\mathtt{=1}^{m+n} \suchthat m, n \in \naturals }\]
For example, the strings 111+1=1111 and +1=1 are in the language, but 1+11=11 is not, nor is the string 1+1+1=111.

Write a CFG for $ADD\text.$

You may find it easier to first write a CFG for strings purely of the form $\mathtt{1}^m\mathtt{=}\mathtt{1}^m\text.$ Once you've done that, see if you can then extend it to the full language.

Let $\Sigma$ be an alphabet containing these symbols:
\[\emptyset \quad \naturals \quad \{ \quad \} \quad , \quad \cup\]
We can form strings from these symbols which represent sets. Here's some examples:
- $\emptyset$
- $\Set{\Set{\naturals, \emptyset} \cup \Set{\emptyset}}$
- $\Set{\emptyset, \Set{\emptyset, \Set{\emptyset}}}$
- $\Set{\emptyset, \naturals} \cup \naturals \cup \emptyset$
- $\naturals \cup \Set{\naturals, \emptyset}$
- $\Set{\Set{\Set{\Set{\naturals}}}}$
- $\Set{\Set{}}$
- $\Set{\emptyset} \cup \naturals \cup \Set{\naturals}$
- $\Set{}$
- $\naturals$
- $\Set{\emptyset, \emptyset, \emptyset}$
- $\Set{\naturals}$
- $\Set{\emptyset, \Set{}}$
Notice that some of these sets, like $\Set{\emptyset, \emptyset}$, are syntactically valid but redundant, and others like {} are syntactically valid but not the cleanest way of writing things. Here's some examples of strings that don't represent sets or aren't syntactically valid:
- $\varepsilon$
- $\naturals, \emptyset, \Set{\emptyset}$
- $\{\emptyset$
- $\}\emptyset\{$
- $\Set{, \naturals}$
- $\}\} \naturals$
- $\emptyset \Set{\naturals}$
- ${ \naturals \emptyset },$
- $\Set{ \emptyset, \emptyset, \emptyset, }$
- $\{\{\}$
- $\Set{,}$
- $\Set{\naturals, , , \emptyset}$
Write a CFG for the language { $w \in \Sigma^\star \suchthat w$ is a syntactically valid string representing a set }. Please use the letters n, u, and o in place of $\naturals$, $\cup$, and $\emptyset$, respectively.

Fun fact: the starter files for Problem Set One contain a parser that’s designed to take as input a string representing a set and to reconstruct what set that is. The logic we wrote to do that parsing was based on a CFG we wrote for sets and set theory. Take CS143 if you’re curious how to go from a grammar to a parser!

As a hint, as is often the case when writing CFGs, we recommend that you use different nonterminals to represent different components of the string. For example, structure of a comma-separated list of sets is different than the structure of an expression representing a single set.

Problem Two: The Fixed-Point Theorem

Let $\Sigma$ be an alphabet. We can then think of a function $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ as a function that takes a language over $\Sigma$ as input and produces a language over $\Sigma$ as output. (Think back to PS6's question about $\powerset{\Sigma^\star}$ if you're curious about why this is.)

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$ and let $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ be defined as
\[f(L) = L \cup \set{\mathtt{a}}L\]
Fill in each of the following blanks without using set-builder notation. No justification is necessary.
- $f(\set{\varepsilon, \mathtt{a}, \mathtt{b}}) = \blank\text.$
- $f(\emptyset) = \blank\text.$
- $f(\Sigma^\star) = \blank\text.$

Now, a new definition. A function $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ is called monotone if the following is true:

\[\forall A \in \powerset{\Sigma^\star}. \forall B \in \powerset{\Sigma^\star}. (A \subseteq B \to f(A) \subseteq f(B))\text.\]

This definition is a bit more subtle than it might initially appear.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Below is a list of four functions from $\powerset{\Sigma^\star}$ to itself. For each function, determine whether it's monotone. No justification is required.
1. $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ defined as $f(L) = L \cup \Set{\mathtt{aaa}}\text.$
2. $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ defined as $f(L) = L \cap \Set{\mathtt{aaa}}\text.$
3. $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ defined as $f(L) = L - \set{\mathtt{a}}L\text.$
4. $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ defined as $f(L) = L^\star\text.$

A fixed point of a function $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ is a language $L$ where $f(L) = L$.

For each of the functions from part (ii) of this problem, determine whether that function has any fixed points and list one if one exists. No justification is necessary.

The rest of this problem considers a remarkable theorem about monotone functions:

Theorem: Let $\Sigma$ be an alphabet and let $f : \powerset{\Sigma^\star} \to \powerset{\Sigma^\star}$ be a monotone function. Then $f$ has a fixed point.

We're going to ask you to prove this theorem by showing that the following language $X$ is a fixed point of $f$:

\[X = \Set{ w \in \Sigma^\star \suchthat \exists L \in \powerset{\Sigma^\star}. (w \in L \land L \subseteq f(L))}\text.\]

Yes, this definition is a mouthful. But it turns out that, indeed, if $f$ is monotone, then $X = f(X)\text.$

Let $L$ be a language where $L \subseteq f(L)$. Prove that $L \subseteq X\text.$

Prove that $X = f(X)\text.$

The overall proof of this result is not very long, but it's also not "obvious." This will be a great place to write out a list of all of the assumptions you're making and all the facts you know about $f$ and $X\text.$ Setting this problem up correctly will make it much easier to discover the solution. Make sure to include your result from part (iv) in that list of facts you know about $f$ and $X$ as it will make a couple of appearances in the proof.

Once you've finished this problem, pat yourself on the back - you've made amazing progress since the start of the quarter!

Fixed points are an amazingly powerful tool in Theoryland. They're used in optimizing compilers to infer information about what a program does, to show certain problems are undecidable, to formally define recursive functions in programming, to prove the existence of equilibria in economics, and to devise fair division protocols in algorithmic game theory. One fun application of fixed points is Sperner's lemma, which can be used to divide rent fairly among three people sharing an apartment with unequal rooms. Take CS143, CS154, CS242, and CS243 to learn more!

Problem Three: Unstarring a Language

As you saw on Problem Set Six, a monoid is a language $M$ where $\epsilon \in M$ and $MM \subseteq M\text.$ You also proved that $L^\star$ is a monoid for any language $L\text.$

If $M$ is a monoid over $\Sigma$, a string $w$ is called a codeword of $M$ when the following is true:

\[w \in M \quad \land \quad w \ne \varepsilon \quad \land \quad \forall x \in M. \forall y \in M. (w = xy \to x = \varepsilon \lor y = \varepsilon)\text.\]

Let's take a minute to explore this definition.

Let $L = \Set{ w \in \Set{\mathtt{a}, \mathtt{b}}^\star \suchthat \abs{w} \text{ is even}}\text.$ Which of the following are codewords of $L\text?$ No justification is necessary.
1. $\varepsilon$
2. $\mathtt{a}$
3. $\mathtt{aa}$
4. $\mathtt{aaa}$
5. $\mathtt{aaaa}$

One last definition. The unstar of a monoid $M$, denoted $M^\dagger,$ is the language

\[M^\dagger = \Set{ w \suchthat w \text{ is a codeword in } M}\text.\]

The unstar of a monoid reveals much of the structure of the monoid itself.

Fill in all of the blanks below without using set-builder notation. For example, you could write $\Set{\varepsilon, \mathtt{a}, \mathtt{aa}, \mathtt{aaa}}\text,$ but not $\Set{ \mathtt{a}^n \suchthat n \le 3 }\text{.}$
1. $\Set{\varepsilon}^\dagger = \blank\text.$
2. $(\Set{\mathtt{a}, \mathtt{b}}^\star)^\dagger = \blank\text.$
3. $\Set{ w \in \Set{\mathtt{a}, \mathtt{b}}^\star \suchthat \abs{w} \text{ is even}}^\dagger = \blank\text.$
4. $\Set{ w \in \Set{\mathtt{a}, \mathtt{b}}^\star \suchthat w \text{ doesn't start with } \mathtt{a} \text{ and doesn't have } \mathtt{aaa} \text{ as a substring}}^\dagger = \blank\text.$

Prove that if $M$ is a monoid over an alphabet $\Sigma\text,$ then $(M^\dagger)^\star = M\text.$

Remember that, to prove two sets are equal, you need to show that each is a subset of the other. Each of these will require a separate line of reasoning. One of them is an immediate consequence of something you proved on Problem Set Six, so you should only need to write about a sentence or two to prove it.

For the other subset proof, proceed by complete induction on the length of the string in question.

One way to interpret your result from part (iii) of this problem is that every string in $M$ can be written by concatenating together some number of strings from $M^\dagger\text.$ This has applications in data compression, networking, and error-correcting codes. Take EE376A (information theory) or CS250 (algebraic error-correcting codes) for more on these topics.

Problem Four: Programming Turing Machines

The Church-Turing thesis might seem surprising in light of just how simple Turing machines are. Can they really do any computation that could be done by any feasible computing system? That is indeed the case, and one of the best ways to see this is to try your hand at designing some TMs to see how moving around and marking up a tape is all that’s needed to perform arbitrary computations.

The starter files for this problem contain our TM Debugger. This will let you run TMs on input strings, either by single-stepping using the ⏩ button or by running the program using the ⏵ button.

We’ve included a bunch of sample TMs for different languages as part of the starter files so that you can see some strategies for solving problems with TMs. They’re in the res/tm-samples/ directory. You aren’t required to read over them, though you might find it interesting to do so.

Unlike the DFA/NFA editor and like the regex and CFG editors, you will write your answers by opening the relevant files in Qt Creator and editing them with your answers.

As an FYI, the TM debugger allows you to input arbitrary strings as input to TMs and doesn’t validate that your input strings have the correct alphabet. For the purposes of this problem, you can assume that we will only test your TMs on strings with the correct alphabet.

You can use the “Run Tests” button to check your work and submit your answers to Gradescope once you’re ready. As always, submit as many times as you’d like; we’ll only grade the answers you submit. The autograder for this problem caps the number of steps allowed by your TMs so that if you get stuck in a loop the autograder won’t hang. (The cap is fairly generous, we don’t anticipate correct solutions being rejected for taking too many steps.) If you get a “timeout” error, it means that your TM hasn’t finished running before it accepts or rejects the input. The TM Debugger will show step counts for your TM, which might help you figure out what’s going on.

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$. Program a TM that’s a decider for { $w \in \Sigma^\star \suchthat w$ has odd length and its middle character is a }. Save your answer in res/MiddleA.tm.

Let $\Sigma = \Set{\mathtt{a}}$. Program a TM that’s a decider for $\Set{ \mathtt{a}^{2^n} \suchthat n \in \naturals }\text.$ That is, your TM should accept all strings of a’s whose lengths are powers of two and reject all other strings. Save your answer in res/Power2.tm.

There are many ways you can approach this problem. One would be to keep cutting the input string length in half until it has an odd length (if that length is one you accept, and if it’s greater than one you reject). Another would be to start with a string of length one and keep doubling its length until it either perfectly matches the input’s length (accept) or exceeds it (reject). Explore and see what you find!

Let $\Sigma = \Set{\mathtt{a}, \mathtt{b}, \mathtt{=}}\text.$ Program a TM that’s a decider for the language $L = \Set{ w\mathtt{=}w \suchthat w \in \Set{\mathtt{a}, \mathtt{b}}^\star }\text.$ For example, the strings aaa=aaa, bab=bab, b=b, and = are all in $L$, while the strings aaa=aab, =aaa, aaa=, and a=a=a are all not in $L$. Save your answer is res/Equal.tm.

One step in implementing this TM will be confirming that the input string has the right “shape,” in the sense that you don’t want to be worrying about inputs like ==== or aa=aa=aa.

If you enjoyed this exercise, feel free to write your own TMs for whatever languages most interest you. If you come up with anything interesting or exciting, feel free to email your creations to the staff mailing list!

Problem Five: Executable Computability Theory

In what follows, assume that $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$, meaning that you only need to consider input strings made of a’s and b’s.

Below are three examples of C++ functions that act as recognizers or deciders. For each of these functions, do the following:

Tell us what language the function is recognizer for. Use set-builder notation if appropriate.
Tell us whether the function is a decider for that language in addition to just being a recognizer.
Tell us whether the language you’ve identified is decidable.

No justification is required for any of these questions.

Remember that a language is decidable if there exists a decider $D$ for it – and that decider does not necessarily have to be the same as the function we’re asking you to look at.

Answer the three above questions about this function:

bool pizkwat(string input) {
    if (input.size() % 2 == 1) return false;
    
    /* Indices within strings range from 0 (the first character) to
     * index.size() - 1 (the last character), inclusive.
     */
    for (int i = 0; i < input.size() / 2; i++) {
        if (input[i] != 'a' || input[input.size() - 1 - i] != 'b') {
            return false;
        }
    }
    return true;
}

Answer the three above questions about this function.

bool squigglebah(string input) {
    std::set<string> strings = {""}; // Initialize a set holding the empty string
    while (true) {
        std::set<string> nextStrings; // Initially empty

        for (string x: strings) {
            if (input == x) return true;

            nextStrings.insert(x + "a"); // Add the string xa to nextStrings
            nextStrings.insert(x + "b"); // Add the string xb to nextStrings
        }

         /* On the next iteration through the main while loop, use nextStrings. */
         strings = nextStrings;
    }
}

A reminder for this part of the problem and for all other parts of the problem: you can assume all input strings are strings over $\Sigma = \Set{\mathtt{a}, \mathtt{b}}$.

Answer the three above questions about this function.

bool moozle(string input) {
    int a = 0;
    int b = 1;
    while (input.size() != a) {
        int c = a + b;
        a = b;
        b = c;
    }
    return true;
}

Problem Six: Timing Attacks

Suppose you're tasked with implementing a login system for a website. One of the user's passwords is the string quokka. You might think that the following piece of code would be a good way to validate their password:

bool isCorrectPassword(const string& input) {
    return input == "quokka";
}

Unfortunately, this is not a safe way to check whether the password is correct. To see why, let's think about how the computer compares two strings. A typical implementation of the == operator on strings might go through the two strings one character at a time until either (1) a mismatch is found, in which case the comparison returns false, or all the characters match, in which case the comparison returns true. This means that it will take longer for the isCorrectPassword function to return false given the input quoted than given the input qwerty, since quoted matches quokka to three places, whereas qwerty only matches the first character. This means that if a malicious user (an attacker in computer security lingo) is able to measure how long it takes for the password to be rejected, they can start to figure out what the password is one character at a time. First, the attacker tries logging in with all one-character passwords and see which one takes the longest to be rejected; that will (statistically) be the first character of the password. Then they can try tacking on all possible next characters to that first character and see which one takes the longest to be rejected, etc. In this way, they can fairly quickly figure out the full password.

This is a classic example of a timing attack, using the fact that the password-checking algorithm doesn't take the same amount of time to complete on all inputs. By using timing information to figure out how the code executed, and thus what portion of the password is correct, an attacker can sometimes directly break into a system without any prior knowledge of the password.

Foiling timing attacks is hard and is usually the provenance of trained security researchers. One common way to mitigate them is to intentionally slow down implementations so that every path through the code takes the exact same amount of time. This runs contrary to our typical intuition that programs should be made to run efficiently, but is necessary to foil this sort of timing attack.

We can formalize this idea theoretically using Turing machines. First, a quick definition. If $M$ is a TM that halts on a string $w\text,$ we denote by $t(M, w)$ the number of steps $M$ makes on $w$ before halting. We define one "step" of a TM to be the execution of a single line of the TM, whether that's a Write, Move, Goto, Return, If, or If Not statement, or whether that's executing a label (which has no effect). (Comments and blank lines don't count as steps.)

Use the provided TM simulator in the starter files to run the TM tm-samples/AnBnCn.tm on the input abc. How many steps does it take for the TM to halt?

Now, our key definition. A constant-time TM is a TM $M$ with the following properties:

\[\begin{aligned} \text{(I)} & \quad \forall w \in \Sigma^\star. M \haltson w\text. \\ \text{(II)} & \quad \forall w \in \Sigma^\star. \forall x \in \Sigma^\star. t(M, w) = t(M, x)\text. \end{aligned}\]

Intuitively, a constant-time TM is one that always takes the same number of steps to halt. It's thus immune to timing attacks, since every execution of the TM takes the same amount of time to finish.

Let $\Sigma = \Set{\mathtt{a}, \dots, \mathtt{z}}$. Program a constant-time TM for the language $\set{\mathtt{quokka}}\text.$ Save your result as res/PasswordChecker.tm. Your TM is essentially a password-checking program that is immune to timing attacks, at least in Theoryland.

The key here is "intentionally slow things down." You will need to have your TM do unnecessary work if it finds out that the password is wrong. You will also need to find a way to have the TM know, once the computation is done, whether to accept or reject the input. There are many ways to accomplish this, and we'll leave the exact design to you.

This is just the tip of the iceberg when it comes to timing attacks. On the computer security side of things, timing attacks are a special case of a broader class of attacks called side-channel attacks in which an attacker uses information leaking through some sort of "side channel" (timing information, power consumption, sounds emitted by keyboards, etc.) are used to attack cryptographic systems. You can learn more about this in CS255 (intro to cryptography) if you'd like - the rabbit hole goes deep here! On the theory side of things, constant-time TMs are a special case of a broader class of TMs called oblivious TMs, in which the runtime and tape head movements are a function purely of the input string length and not what that string is. There's a deep theorem that says that any TM can be transformed into an equivalent oblivious TM. This implies that, at least in principle, it should be possible to design programs that don't leak any timing information.

Problem Seven: What Does it Mean to Solve a Problem?

Let $L$ be a language over $\Sigma$ and $M$ be a TM with input alphabet $\Sigma$. Here are three potential traits of $M$:

$\forall w \in \Sigma^\star. M \haltson w.$
$\forall w \in \Sigma^\star. (M \accepts w \to w \in L).$
$\forall w \in \Sigma^\star. (M \rejects w \to w \notin L).$

At some level, for a TM to claim to solve a problem, it should have at least some of these properties. Interestingly, though, just having two of these properties doesn't say much.

Let $L$ be an arbitrary language. Write a TM that satisfies properties (1) and (2). No justification is necessary.

You might be wondering “how on earth is it possible to write a TM with those properties if we have no idea what $L$ is?” That is a reasonable question to ask, and gets at the heart of what this question is about. The whole point of this problem is to show that you have to be extremely careful about how you define “solving a problem,” since if you define it incorrectly then you can “solve” a problem in a way that bears little resemblance to what we’d think of as solving a problem. Keep this in mind as you work through this one.

Let $L$ be an arbitrary language. Write a TM that satisfies properties (1) and (3). No justification is necessary.

Let $L$ be an arbitrary language. Write a TM that satisfies properties (2) and (3). No justification is necessary.

Suppose $L$ is a language over $\Sigma$ for which there is a TM $M$ that satisfies all of properties (1), (2), and (3). What can you say about $L$? Prove it.

In the earlier parts of this problem you showed that just having two of these three properties doesn't tell you much about $L$. As you're writing your proof for this problem, make sure you see exactly where you're citing all three of the relevant properties. If you don't use all three properties, chances are that your proof is incorrect.

Optional Fun Problem: TMs and Regular Languages

Let $M$ be a constant-time TM and let $L$ be the language it decides. Prove that $L$ is regular.