Practice Diagnostic 1

Here is the note on logistics that would normally comprise the first page of the actual exam.

Intro Announcement

You may consult inanimate resources during the diagnostic but are prohibited from communicating with any other humans except the course staff. This means, for example, that it is permissible to look in the textbook, visit the course website, to open Qt Creator and run code, and review your own code. However, you are not allowed to discuss with others through any channel, nor are you permitted to post the questions anywhere where other humans could see them or offer advice or suggestions (for example, by posting on a public Q&A website or emailing questions to other people). Note: all diagnostic submissions will be examined for similarity using the same tool we apply to assignments.

Unless otherwise indicated as part of the instructions for a specific problem, comments will not be required. Uncommented code that gets the job done will be sufficient for full credit. On the other hand, comments may help you get partial credit if they help us determine what you were trying to do. You do not need to worry about efficiency unless a problem states otherwise. You don’t need to prototype helper functions you write, and you don’t need to explicitly #include libraries you want to use.

This exam is "self-contained" in the sense that if you’re expected to reference code from the lectures, textbook, assignments, or section handouts, we’ll explicitly mention what that code is and provide sufficient context for you to use it.

Problem 1: Interpreting Code

Part A: Analyzing Code Output and Runtime

This question is designed to let you demonstrate what you’ve learned about algorithmic analysis and data structure design. Below are four functions. We picked one of those functions and ran it on many different values of n. We captured the output of that function on each of those inputs, along with the runtime. That information is printed in the table below.

Function 1

int function1(int n) { 
    Stack<int> values; 
    for (int i = 0; i < n; i++) { 
        values.push(i); 
    }
    int result; 
    while (!values.isEmpty()) { 
        result = values.pop(); 
    } 
    return result; 
} 

Function 2

int function2(int n) {
    Queue<int> values;
    for (int i = 0; i < n; i++) { 
        values.enqueue(i); 
    } 
    int result; 
    while (!values.isEmpty()) { 
        result = values.dequeue(); 
    } 
    return result; 
} 

Function 3

int function3(int n) { 
    Set<int> values; 
    for (int i = 0; i < n; i++) { 
        values.add(i); 
    } 
    int result;
    for (int value: values) { 
        result = value; 
    }
    return result; 
} 

Function 4

int function4(int n) { 
    Vector<int> values; 
    for (int i = 0; i < n; i++) { 
        values.add(i); 
    }
    int result; 
    while (!values.isEmpty()) { 
        result = values[0]; 
        values.remove(0); 
    }
    return result; 
} 

`n`	Time	Return Value
100,000	0.137s	99999
200,000	0.274s	199999
300,000	0.511s	299999
400,000	0.549s	399999
500,000	0.786s	499999
600,000	0.923s	599999
700,000	0.960s	699999
800,000	1.198s	799999
900,000	1.335s	899999
1,000,000	1.472s	999999

For each of these pieces of code, tell us its big-O runtime as a function of n. No justification is required.
For each of these pieces of code, tell us whether that function could have given rise to the return values reported in the rightmost column of the table below. No justification is required.
Which piece of code did we run? How do you know? Justify your answer in at most fifty words.

Solution

The runtimes are as follows:
- function1 runs in time O(n).
- function2 runs in time O(n).
- function3 runs in time O(n log n).
- function4 runs in time O(n^2).

Some explanations: In function1, we do n pushes followed by n pops. The cost of each stack operation is O(1), so this means we’re doing 2n operations at an effective cost of O(1) each, for a net total of O(n). The same is true about queue operations: each one takes time O(1), which is why function2 takes time O(n) as well.

For function3, inserting an element into a set takes time O(log n). This means that the cost of inserting n elements is O(n log n). The cost of iterating over the set is also O(n) so the net runtime is O(n log n).

For function4, adding n elements to the end of a vector takes time O(n). However, removing from the front of a vector with n elements also takes time O(n), since we have to shift all the other elements back one position. This means that the overall runtime is O(n^2).

The answers: function1 cannot produce the given output. function2 will always produce this output. function3 will always produce this output. function4 will always produce this output.

Some explanations: In function1, since elements are stored in a stack, the last element popped is the first element pushed, which would always be zero. Therefore, we’d expect to see a column of zeros in the table, which doesn’t match what’s actually there.

In function2, the last element removed from the queue is the last element added to the queue, which, here, would be n – 1, matching the output.

In function3, the Set type stores its elements in sorted order. Iterating over the set, therefore, will visit the elements in ascending order, so the last element iterated over by the loop would be n – 1, matching the output.

Finally, in function4, we remove elements from the vector in the reverse order in which they’re added, matching the queue’s ordering and making the last element visited exactly n – 1.

First, notice that the runtime appears to be O(n); doubling the size of the inputs roughly doubles the runtime. That leaves function1 and function2 as choices, and function1 has the wrong return value. Therefore, we must have run function2.

Part B: Tracing a Recursive Function

For each of the calls to the following recursive function below, indicate what output is produced:

void recursionMystery2(int n) { 
    if (n <= 1) { 
        cout << ": ";
    } else {
        cout << (n % 2) << " "; 
        recursionMystery2(n / 2); 
        cout << n << " "; 
    } 
}

Call	Output
a) `recursionMystery2(8);`
b) `recursionMystery2(25);`
c) `recursionMystery2(46);`

Solution

Call	Output
a) `recursionMystery2(8);`	`0 0 0 : 2 4 8`
b) `recursionMystery2(25);`	`1 0 0 1 : 3 6 12 25`
c) `recursionMystery2(46);`	`0 1 1 1 0 : 2 5 11 23 46`

Problem 2: C++ Fundamentals

Write a function named printBox that accepts two parameters: a Vector<string> holding the lines of a file, and an int for a width. Your function should go through the lines and display them to the console with a 'box' border of ‘#’ characters around them on all four sides. The second parameter to your function indicates the total width of the box including its border. You must also convert each line to "title case" by capitalizing the first letter of the line and lowercasing all subsequent letters. For example, suppose the file poem.txt contains the following text:

Your skin like dawn
Mine like musk
 
One paints the beginning
of a certain end.
 
The other, the end of a
sure beginning.

-Maya Angelou

Notice that the file might contain blank lines, which will be represented by the empty string. The passed in Vector of lines would look like this:

{"Your skin like dawn", "Mine like musk", "", "One paints the beginning", "of a certain end", "", "The other, the end of a", "sure beginning.", "", "-Maya Angelou"}

The following would be the outputs from two calls to your function:

printBox(lines, 26)

########################## 
#Your skin like dawn     # 
#Mine like musk          #
#                        # 
#One paints the beginning#
#Of a certain end.       #
#                        # 
#The other, the end of a #
#Sure beginning.         #
#                        # 
#-maya angelou           #
##########################

printBox(lines, 35)

################################### 
#Your skin like dawn              # 
#Mine like musk                   #
#                                 # 
#One paints the beginning         #
#Of a certain end.                #
#                                 # 
#The other, the end of a          #
#Sure beginning.                  #
#                                 # 
#-maya angelou                    #
###################################

You may assume that the width value passed is non-negative and that no line in the file is too wide to fit into a box of that width.

Fill in the function below:

void printBox(Vector<string> lines, int width) {
     /* FILL ME IN */
}

Solution

void printPoundLine(int width) {
    for (int i = 0; i < width; i++) {
        cout << "#";
    }
    cout << endl;
}

void printBox(Vector<string> lines, int width) {
    printPoundLine(width);
    for (string line: lines) {
        string output = "#"; 
        if (line != ""){
            output = output + toUpperCase(line.substr(0, 1)) + toLowerCase(line.substr(1));
        }
        while (output.length() < width - 1) {
            output += " ";
        }
        cout << output << "#" << endl;
    }
    printPoundLine(width);
}

Problem 3: ADTs

Google Translate is powered, in large part, by a technique called word2vec that builds an understanding of a language by looking at the context in which each word occurs.

Imagine you have a word like "well." There are certain words that might reasonably appear immediately before the word "well" in a sentence, like "feeling," "going," "reads," etc., and some words that that are highly unlikely to appear before "well," like "cake," "circumspection," and "election." The idea behind word2vec is to find connections between words by looking for pairs of words that have similar sets of words preceding them. Those words likely have some kind of connection between them, and the rest of the logic in word2vec works by trying to discover what those connections are.

Your task is to write a function

Map<string, Set<string>> predecessorMap(Vector<string>& input);

that takes as input a Vector of strings representing the lines of a file, then returns a Map<string, Set<string>> that associates each word in the file with all the words that appeared directly before that word. For example, given a Vector containing the lines of JFK’s famous quote:

"Ask not what your country can do for you;
ask what you can do for your country,"

that looks like this

{""Ask not what your country can do for you;", "ask what you can do for your country,""}

your function should return a Map with these key/value pairs:

"not" : { "ask" } 
"what" : { "not", "ask" } 
"your" : { "what", "for" } 
"country" : { "your" } 
"can" : { "country", "you" } 
"do" : { "can" } 
"for" : { "do" } 
"you" : { "for", "what" }
"ask" : { "you" }

Notice that although the word "ask" appears twice in the quote, the first time it appears it’s the first word in the file and so nothing precedes it. The second time it appears, it’s preceded by a word from a different line, but before that is the word "you," which is what ultimately appears in the output map.

Some notes on this problem:

You can assume you have access to a function

Vector<string> tokenize(string line);

that takes in a line, accurately tokenizes it into cleaned words (stripped of all punctuation, but with the original casing retained), and returns all of the words in the line as a Vector of strings.

If a word in a line contains no letters, it gets stripped to the empty string by the aformentioned tokenize function. Your map-building process should completely ignore these tokens.
Your code should be case-insensitive, so it should return the same result regardless of the capitalization of the words in the file. The capitalization of the keys in the map is completely up to you.
It is not guaranteed that the file has any words in it, and there’s no upper bound on how big the file can be.

Fill in the function below:

Vector<string> tokenize(string line); // Provided to you

Map<string, Set<string>> predecessorMap(Vector<string>& input) {
    /* FILL ME IN */
}

Solution

Map<string, Set<string>> predecessorMap(Vector<string>& input) {
    Map<string, Set<string>> result;
    string lastWord;  // Most-recently-read string, initially empty as a sentinel.
    
    /* Loop over all lines */
    for (string line: input) {
        Vector<string> tokens = tokenize(line);
        for (string token: tokens) {
            if (token != "") {
                token = toLowerCase(token);
                if (lastWord != "") {
                    result[token].add(lastWord);
                }
                lastWord = token;
            }
        }
    }
    return result;
}

Problem 4: Recursion

Chapter exercise 7.7 from the textbook

Write a recursive function digitSum(n) that takes a nonnegative integer and returns the sum of its digits. For example, calling digitSum(1729) should return 1 + 7 + 2 + 9, which is 19.

The recursive implementation of digitSum depends on the fact that it is very easy to break an integer down into two components using division by 10. For example, given the integer 1729, you can divide it into two pieces as follows:

                            1729
                             / \
                          127   9

Each of the resulting integers is strictly smaller than the original and thus represents a simpler case.

Solution

int digitSum(int n)
{
    if (n < 10) {
        return n;
    } else {
        return digitSum(n / 10) + digitSum(n % 10);
    }
}

Problem 5: Recursive Backtracking

Write a recursive function named divide that uses backtracking to divide the letters from word w into two words s1 and s2. Each letter from w must be used exactly once either in s1 or s2, and the relative order of the letters must be preserved, meaning that s1 and s2 must both be subsequences of w.

For example, the word friendly can be divided into find + rely. This solution is valid because s1 and s2 together contain all of the letters from w, the letters in s1 and s2 appear in the same relative order as they did in w, and both s1 and s2 are valid words in the English dictionary.

For some words, no valid division is possible. For example, consider dividing the word stream. It would not be valid to divide into star + me (relative order of letters not preserved), nor stem + ram (reuses 'm'), nor stem + a ('r' not used), nor seam + tr (not in the dictionary).

You will be passed two parameters: a reference to a Lexicon of dictionary words, and the string w to divide. Use the Lexicon passed to ensure that the divided words are both found in the dictionary. If the function finds a valid division, it prints the two words and searches no further. If no valid division is found, the function prints nothing.

The following table lists some example calls to your function and possible console output, given a Lexicon called dict. Rows showing blank output indicate calls that were unable to find any successful division of the given word.

Call	Output
`divide(dict, "friendly")`	`find + rely`
`divide(dict, "standing")`	`sting + and`
`divide(dict, "midterm")`	`mid + term`
`divide(dict, "recuperate")`	`cure + repeat`
`divide(dict, "stream")`
`divide(dict, "atomic")`
`divide(dict, "czar")`
`divide(dict, "xyz")`

If there are multiple valid divisions for word w, you may print any of them. Regardless of which division you print, your code must stop after finding a single solution; do not find/print all possible solutions.

Efficiency: While your code is not required to have any particular Big-Oh, you may lose points if your code is extremely inefficient or repeatedly re-explores the same path multiple times. In particular, your recursive exploration should not continue exploring a path where the prefix of the word you are examining cannot possibly make a real English word. Use the Lexicon to help you prune such unwanted searches. Pass all data structures by reference to avoid making copies.

Helpers: As with any coding problem on this exam, you may write helper functions as needed to help solve the problem.

Constraints: For full credit, obey the following restrictions. A solution that disobeys them can get partial credit.

Do not declare any global variables.
Your overall algorithm must be recursive and must use backtracking techniques to generate the results.

Solution

void divide(Lexicon& lex, string input) {
    divideHelper(lex, input, "", "");
}

bool divideHelper(Lexicon& lex, string input, string left, string right) {
    if (input.empty()) {
        if (lex.contains(left) && lex.contains(right)) {
            cout << left << " + " << right << endl;
            return true;
        } else {
            return false;
        }
    } else if (!lex.containsPrefix(left) || !lex.containsPrefix(right)) {
        return false;
    } else {
        char ch = input[0];
        string rest = input.substr(1);
        return (divideHelper(lex, rest, left + ch, right)
                || divideHelper(lex, rest, left, right + ch));
    }
}