Section2: ADTs and Big-O

Section materials curated by our Head TA Trip Master, drawing upon materials from previous quarters. Special thanks to
Nick Bowman and Julie Zelenski for awesome ideas!

This week’s section exercises explore ADT and Big-O problems. By the end of this week, you'll be well-versed in all kinds of ADTs, and you'll be able to think critically about program runtime. What a time to be alive!

Remember that every week we will also be releasing a Qt Creator project containing starter code and testing infrastructure for that week's section problems. When a problem name is followed by the name of a .cpp file, that means you can practice writing the code for that problem in the named file of the Qt Creator project. Here is the zip of the section starter code:

📦 Starter code

1) Collection Mystery

Topics: Stacks and queues

void collectionMystery(Stack<int>& s) 
{ 
    Queue<int> q;
    Stack<int> s2;

    while (!s.isEmpty()) {
       if (s.peek() % 2 == 0) {
            q.enqueue(s.pop()); 
        } else {
            s2.push(s.pop());
        }
    }
    while (!q.isEmpty()) {
        s.push(q.dequeue()); 
    }
    while(!s2.isEmpty()) { 
        s.push(s2.pop());
    }
    cout<< s << endl;
}

Write the output produced by the above function when passed each of the following stacks. Note that stacks and queues are written in front to back order, with the oldest element on the left side of the queue/stack.

Stacks:

{1, 2, 3, 4, 5, 6}                ________________________________________
{42, 3, 12, 15, 9, 71, 88}        ________________________________________
{65, 30, 10, 20, 45, 55, 6, 1}    ________________________________________

Solution

 {6, 4, 2, 1, 3, 5}
 {88, 12, 42, 3, 15, 9, 71}
 {6, 20, 10, 30, 65, 45, 55, 1}

2) Oh No, Big-O, Too Slow

Topics: Big-O, code analysis

Give a tight bound of the nearest runtime complexity class for each of the following code fragments in Big-Oh notation, in terms of the variable N.

Code Snippet A

int sum = 0;
for (int i = 1; i <= N + 2; i++) {
    sum++;
}
for (int j = 1; j <= N * 2; j++) {
    sum++;
}
cout << sum << endl;

Code Snippet B

int sum = 0;
for (int i = 1; i <= N - 5; i++) {
    for (int j = 1; j <= N - 5; j += 2) {
        sum++;
    }
}
cout << sum << endl;

Code Snippet C

int sum = 0;
for (int i = 0; i < 1000000; i++) {
    for (int j = 1; j <= i; j++) {
        sum += N;
    }
    for (int j = 1; j <= i; j++) {
        sum += N;
    }
    for (int j = 1; j <= i; j++) {
        sum += N;
    }
}
cout << sum << endl;

Solution

Code Snippet A has a runtime complexity of O(N).
Code Snippet B has a runtime complexity of O(N^2).
Code Snippet C has a runtime complexity of O(1).

3) DoubleQueue (`doubleQueue.cpp`)

Topic: Queues

Write a function

void doubleQueue(Queue<int>& q)

that accepts a reference to a Queue of integers as a parameter and replaces every element with two copies of itself. For example, if q stores {1, 2, 3}, the call of doubleQueue(q) should change it to store {1, 1, 2, 2, 3, 3}.

What's the runtime of your doubleQueue implementation?

Let's assume (whimsically) that ALL Queue operations take O(n) time. What would the runtime of your code be now?

Solution

void doubleQueue(Queue<int>& q) 
{
    int size = q.size();
    for (int i = 0; i < size; i++) {
        int n = q.dequeue(); 
        q.enqueue(n); 
        q.enqueue(n);
    }
}

The runtime of this code is O(n), where n is the size of the Queue Q. If Queue operations were O(n), runtime would be O(n^2). Note that the 3n inside our loop gets turned into n operations!

Classic "gotcha" -> Students love iterating through containers while modifying them. See what happens if you get rid of the line that declares SIZE; you'll be running for quite a while.

4) New phone, who dis? (`contacts.cpp`)

Topics: Maps, debugging

Imagine you are tasked with writing the software on a cell phone that manages the "contacts" application. Namely, it's your job to maintain the database of phone number to name mappings on the phone. This is stored as a Map<string, string>, where the first string is a phone number, and the second string is the correponding contact. Here's the current code that executes a lookup in the map:

// Returns the name associated with the provided PHONENUMBER.
// Returns empty string if the number does not exist in the map.
string findContact(string phoneNumber, Map<string, string>& contacts) 
{ 
    // If PHONENUMBER does not exist in CONTACTS,
    // The Map access will return "". 
    if (contacts[phoneNumber] == "") {
        return "";
    } else {
        return contacts[phoneNumber];
    }
}

⚠️Warning!⚠️

Although the above code passes the provided functionality tests, it contains a bug! Can you step through one of the PROVIDED_TESTS in the debugger to uncover what the problem is? Can you write a STUDENT_TEST that exposes the bug? How would you fix the bug?

Lastly, what's the harm in this bug? If it passes all functionality tests, is it worth fixing? Why or why not?

Solution

This bug actually came up as a common pitfall in a midterm question a few years ago.

When you look up an element in a Map that doesn't exist, the Map will automatically insert that element into the map with the default value (0 or empty string, or some equivalent). This means that the size of your map increases if you try and access into the Map with a brand-new key.

Is this a problem? Although for basic functionality it may not be, you certainly don't want spurious entires clogging up your database. This problem will also compound over time – Map lookup is O(log(n)), and your lookups will end up suffering artificially slower runtimes!

You could expose this test by verifying the size of the map is always equal to the number of contacts added.

The fix here is to use containsKey() instead – it doesn't auto-insert. Alternatively, you could just add const to the Map declaration in findContact, and that will also prevent auto-insertion.

Here's a ternary operator for free, too ;).

// containsKey() solution
string findContact(string phoneNumber, Map<string, string>& contacts) 
{ 
    return contacts.containsKey(phoneNumber) ? contacts[phoneNumber] : "";
}

// const solution
string findContact(string phoneNumber, const Map<string, string>& contacts) 
{ 
    return (contacts[phoneNumber] != "") ? contacts[phoneNumber] : "";
}

5) Check Balance (`balance.cpp`)

Topic: Stacks

Write a function

int checkBalance(string code)

that accepts a string of source code and uses a Stack to check whether the braces/parentheses are balanced. Every ( or { must be closed by a } or ) in the opposite order. Return the index at which an imbalance occurs, or -1 if the string is balanced. If any ( or { are never closed, return the string's length.

Here are some example calls:

//   index    0123456789012345678901234567
checkBalance("if (a(4) > 9) { foo(a(2)); }") 
// returns -1 (balanced)

//   index    01234567890123456789012345678901
checkBalance("for (i=0;i<a;(3};i++) { foo{); )")
// returns 15 because } is out of order

//   index    0123456789012345678901234
checkBalance("while (true) foo(); }{ ()")
// returns 20 because } doesn't match any {

//   index    01234567
checkBalance("if (x) {")
// returns 8 because { is never closed

Solution

int checkBalance(string code) {
    Stack<char> parens;
    for (int i = 0; i < (int) code.length(); i++) {
        char c = code[i];
        if (c == '(' || c == '{') {
        parens.push(c);
        } else if (c == ')' || c == '}') {
            if (parens.isEmpty()) {
                return i;
            }
            char top = parens.pop();
            if ((top == '(' && c != ')') || (top == '{' && c != '}')) {
                return i;
            }
        }
    }

    if (parens.isEmpty()) {
        return -1; // balanced
    } else {
        return code.length();
    }
}

6) Friend List (`friendlist.cpp`)

Topic: Maps, file reading

Write a function

Map<string, Vector<string>> friendList(String filename)

that takes in a file name and reads friend relationships from a file and writes them to a Map. friendList should return the populated Map. Friendships are bi-directional, so if Grant is friends with Jose, Jose is friends with Grant. The file contains one friend relationship per line, with names separated by a single space. You do not have to worry about malformed entries.

If an input file named buddies.txt looked like this:

Jin-Hee Grant
Grant Jose
Jose Ayelet
Ayelet Ryan
Ryan Jin-Hee

Then the call of friendList("buddies.txt") should return a resulting map that should like this (assuming you traverse the input file top-down):

{"Ayelet":{"Jose", "Ryan"}, "Grant":{"Jin-Hee", "Jose"}, "Jin-Hee":{"Grant", "Ryan"}, "Jose":{"Grant", "Ayelet"}, "Ryan":{"Ayelet", "Jin-Hee"}}

Disclaimer: For those of you who are worried about the sample file data, fret not. All of our Section Leaders are friends!

Bonus: Once you've coded this up, take a look at the example output of this function. Is there anything weird about the ordering of they keys (i.e. did something behind the scenes reorder the keys)? Once you've learned about Big-O, take a look at the documentation for Maps and see if its insertion / lookup times match that of any other common Stanford Data structures. What might that imply about the underlying implementation of the Stanford Map? Check in with a member of the teaching team to verify your findings.

Solution

Map<string, Vector<string>> friendList(string filename) {
    ifstream in;
    Vector<string> lines;

    if (openFile(in, filename)) {
        readEntireFile(in, lines);
    }

    Map<string, Vector<string> > friends;
    for (string line: lines) {
        Vector<string> people = stringSplit(line, " ");
        string s1 = people[0];
        string s2 = people[1];
        friends[s1] += s2;
        friends[s2] += s1;
    }
    return friends;
}

For those interested, the backing implementation of a Stanford Map is actually a Stanford Set. Sets, although theoretically not ordered, are backed by a special data structure called a Binary Search Tree (you'll learn about these in a few weeks!) The Binary Search Tree has a couple of cool properties, but one of them is O(log(n)) insertiion and lookup (so long as the tree is balanced). The structure of the tree also orders the elements so that a traversal (or a print dump) reveals that the elements are interally sorted. Try it yourself – insert 100 random integers into a Stanford Set and then print them out!

7) Twice (`twice.cpp`)

Topic: Sets, Big-O

Write a function

Set<int> twice(Vector<int>& v)

that takes a vector of integers and returns a set containing all the numbers in the vector that appear exactly twice.

Example: passing {1, 3, 1, 4, 3, 7, -2, 0, 7, -2, -2, 1} returns {3, 7}.

What's the Big-O runtime of your solution?

Bonus I: Do the same thing, but you are not allowed to declare any kind of data structure other than sets.

Bonus II: (Independently from Bonus I) Implement a working solution that runs in O(n) time, where n is the length of the input vector. Can you do better than O(n)? Why or why not?

Solution

// solution
Set<int> twice(Vector<int>& v) {
    Map<int, int> counts;
    for (int i : v) {
        counts[i]++;
    }
    Set<int> twice;
    for (int i : counts) {
        if (counts[i] == 2) {
            twice += i;
        }
    }
    return twice;
}

The runtime of your code may depend on your implementation, but the above implementation runs in O(nlog(n)).

Justification: Because we use a map, each access to COUNTS is a log(n) operation, which we do for every element in V. When we iterate through COUNTS and add to our set TWICE – in the worst case we have (n/2) * 2 log(n) operations, which we can simplify to nlog(n).

In our case, the use of a hashet would not greatly affect the runtime because the nlog(n) is present when we add to our map. If you used a HashMap and a HashSet to the above solution, however, you could achieve O(n) runtime because map population would become O(n), and both the map access and the set insertion would be O(1), leading to an O(n) total runtime!

You can't do better than O(n) because you can't account for a number's frequency without first examining its value.

// bonus 1
Set<int> twice(Vector<int>& v) {
    Set<int> once;
    Set<int> twice;
    Set<int> more;
    for (int i : v) {
        if (once.contains(i)) {
            once.remove(i);
            twice.add(i);
        } else if (twice.contains(i)) {
            twice.remove(i);
            more.add(i);
        } else if (!more.contains(i)) {
            once.add(i);
        }
    }
    return twice;
}

Section2: ADTs and Big-O

1) Collection Mystery

2) Oh No, Big-O, Too Slow

3) DoubleQueue (doubleQueue.cpp)

4) New phone, who dis? (contacts.cpp)

5) Check Balance (balance.cpp)

6) Friend List (friendlist.cpp)

7) Twice (twice.cpp)

3) DoubleQueue (`doubleQueue.cpp`)

4) New phone, who dis? (`contacts.cpp`)

5) Check Balance (`balance.cpp`)

6) Friend List (`friendlist.cpp`)

7) Twice (`twice.cpp`)