Section 1. C++ fundamentals

Section materials curated by Trip Master, drawing upon materials from previous quarters.

Update: Due to the power outage, Section 1 has been cancelled.

This week’s section exercises explore the very fundamentals of programming in C++. We'll be exploring the material from Week 1 (functions, parameters, return, decomposition, strings, and testing). Have fun!

Each week, we will also be releasing a Qt Creator project containing starter code and testing infrastructure for that week's section problems. When a problem name is followed by the name of a .cpp file, that means you can practice writing the code for that problem in the named file of the Qt Creator project. Here is the zip of the section starter code:

Note: we intentionally put in more problems on section handouts than you can cover in section. You can use the extra problems for practice and review, especially for exams. Due to our shortened week, we will have slightly less material on this handout.

📦 Starter code

1) Returning and Printing

Topics: Function call and return, return types

Below is a series of four printLyrics_v# functions, each of which has a blank where the return type should be. For each function, determine

what the return type of the function should be,
what value, if any, is returned, and
what output, if any, will be produced if that function is called.

Is it appropriate for each of these functions to be named printLyrics? Why or why not?

_____ printLyrics_v1() {
    cout << "Havana ooh na na" << endl;
}
_____ printLyrics_v2() {
    return "Havana ooh na na";
}
_____ printLyrics_v3() {
    return "H";
}
_____ printLyrics_v4() {
    return 'H';
}

Solution

void printLyrics_v1() {
    cout << "Havana ooh na na" << endl;
}

string printLyrics_v2() {
    return "Havana ooh na na";
}

string printLyrics_v3() {
    return "H";
}

char printLyrics_v4() {
    return 'H';
}

Of these four functions, only printLyrics_v1 will print anything. Specifically, it prints out the string "Havana ooh na na.". The name “printLyrics” is inappropriate for the other functions, as those functions don’t actually print anything. 😃

The function printLyrics_v1 doesn’t return anything – it just sends information to the console. As a result, its return type should be void. The functions printLyrics_v2 and printLyrics_v3 each return strings, since C++ treats anything in double-quotes as a string. Finally, printLyrics_v4 returns a char, since C++ treats anything in single-quotes as a character.

2) Pig-Latin (`piglatin.cpp`)

Topics: Strings, reference parameters, return types

Write two functions, pigLatinReturn and pigLatinReference, that accept a string and convert said string into its pig-Latin form. To convert a string into pig-Latin, you must follow these steps:

Split the input string into 2 strings: a string of characters BEFORE the first vowel, and a string of characters AFTER (and including) the first vowel.
Append the first string (letters before the first vowel) to the second string.
Append the string "ay" to the resulting string.

Here are a few examples…

nick -> icknay

chase -> asechay

chris -> ischray

You will need to write this routine in two ways: once as a function that returns the pig-Latin string to the caller, and once as a function that modifies the supplied parameter string and uses it to store the resulting pig-Latin string. These will be done in pigLatinReturn and pigLatinReference, respectively. You may assume that your input is always a one-word, all lowercase string with at least one vowel.

Here's a code example of how these functions differ…

string name = "julie";
string str1 = pigLatinReturn(name);
cout << str1 << endl; // prints "uliejay"

pigLatinReference(name);
cout << name << endl; // prints "uliejay"

Once you've written these functions, discuss with your section the benefits and drawbacks of these two approaches. Which do you feel is easier to write? Which do you think is more convenient for the caller? Do you think one is better style than the other?

Solution

// Use const because VOWELS won't change -- no need to declare repeatedly
// in isVowel.
const string VOWELS = "aeiouy";

// Helper function, which I'd highly recommend writing!
bool isVowel(char ch) {
    // A little kludgy, but the handout guarantees that
    // ch will ALWAYS be lower case :)
    // NOTE: For an assignment, you probably want a more robust isVowel.
    return VOWELS.find(ch) != string::npos;
}

string pigLatinReturn(string input) {
    int strOneIndex = 0;
    for (int i = 0; i < input.length(); i++) {
        if (isVowel(input[i])) {
            strOneIndex = i;
            break;
        }
    }
    string strOne = input.substr(0, strOneIndex);
    string strTwo = input.substr(strOneIndex);
    return strTwo + strOne + "ay";
}

void pigLatinReference(string &input) {
    int strOneIndex = 0;
    for (int i = 0; i < input.length(); i++) {
        if (isVowel(input[i])) {
            strOneIndex = i;
            break;
        }
    }
    string strOne = input.substr(0, strOneIndex);
    string strTwo = input.substr(strOneIndex);
    input = strTwo + strOne + "ay";
}

Notice how similar these two approaches are – the only difference is how the result is handled at the very end. To address the discussion questions, although the pigLatinReference function is marginally more efficient because it doesn't need to make a copy of the input string, pigLatinReturn is probably more intuitive for both the caller and the writer: if the function's job is to somehow output some product, returning is the most explicit way to do so. In that way, a function that returns is also better style – it's makes the purpose of the function clearer to the reader.

If you wanted to combine the efficiency of pigLatinReference with the clarity of pigLatinReturn, I would recommend writing a function that takes in the input string by const reference, basically

string pigLatin(const string &input);

Although the const isn't explicitly necessary, it's nice to have because you never need to modify input. Moreover, you still get the efficiency gains from pass-by-reference while also writing very-understandable code.

3) DNA Detectives (bestindex.cpp)

Topic: strings, testing

The four types of bases in DNA are adenine (A), cytosine(C), guanine (G), and thymine (T). DNA consists of pairs of these bases wound together in long sequences. The way these bases pair up is as follows:

adenine (A) pairs with thymine (T) and vice-versa

cytosine (C) pairs with guanine (G) and vice-versa

You are given a function called bestIndexInSequence(), which takes in two strings, sequence and sample, and returns the int index of the location in sequence that has the most pairs with sample. Below is a short example:

string testSequence = "TCTTC";
string testSample = "AGT";

cout << bestIndexInSequence(testSequence, testSample) << endl; // Prints '0'

The result of the above code is 0, because if you align testSample at index 0 of testSequence, you get 2 matches, an A-T match, and a C-G match. No other valid starting index yields more than a single match, so 0 is the best index to return.

A few details about this implementation:

If there are multiple valid indices that yield the most matches, you can return any index.
You may assume sequence is always at least as large as sample.
You may not return an index that causes sample to 'go off' the sequence string. In the example above, a return value of 3 would be invalid (the 'T' in testSample would not be matched with anything).
You may assume that both input strings are always well-formatted (all capitalized and only A, T, C, or G).

We have provided an implementation of bestIndexInSequence() for you in bestindex.cpp. Although the provided code is complete, it contains zero tests. Your task is to use the SimpleTest framework to verify the correctness of the program. If you find any bugs, please discuss them with your section.

Solution

Here's the test suite we used to uncover functionality issues in the provided code:

STUDENT_TEST("Verify getMatches identifies all kinds of matches."){

    // Case 1: A simple 4-for-4.
    string s1 = "ATCG";
    string s2 = "TAGC";
    EXPECT_EQUAL(getMatches(s1,s2), 4);

    // Case 1.5: Make sure order doesn't matter. This would
    // come into play if the writer did some hard-codey stuff.
    EXPECT_EQUAL(getMatches(s2,s1), 4);

    // Case 2: An example with mismatches.
    string s3 = "AAAA";
    EXPECT_EQUAL(getMatches(s1, s3), 1);
}

STUDENT_TEST("Test that bestIndexInSequence does not go off the SEQUENCE string."){
    // Use a modified example from the handout, which could produce the
    // wrong answer if getMatches does something incorrect with the edge case.
    string s1 = "TTCTTC";
    string s2 = "AGT";
    EXPECT_EQUAL(bestIndexInSequence(s1, s2), 1);
}

STUDENT_TEST("Verify bestIndexInSequence actually returns the best index."){

    // Case 1: Perfect match.
    string s1 = "TCGATCAGTCCC";
    string s2 = "AGT";
    EXPECT_EQUAL(bestIndexInSequence(s1, s2), 4);

    // Case 2: Imperfect match, single best index.
    string s3 = "TCTTC";
    string s4 = "AGT";
    EXPECT_EQUAL(bestIndexInSequence(s3, s4), 0);

    // Case 3: Multiple possible best indices.
    string s5 = "TCGATCAGTCCTCAC";
    string s6 = "AGT";
    bool firstMatch = bestIndexInSequence(s5, s6) == 4;
    bool secondMatch = bestIndexInSequence(s5, s6) == 11;
    // A finer point -- you can use EXPECT wherever you can use
    // EXPECT_EQUAL, but sometimes it makes more sense to use one
    // or the other.
    EXPECT(firstMatch || secondMatch);
}

There are actually a few bugs in the provided code:

The for loop should not loop for the length of the sequence – in reality it should stop at the index within sequence that aligns sample perfectly with it's last characters. This is sequence.length() - sample.length().
When we update bestIndex, we should also be updating bestMatches so that we actually know when to update the best index.
getMatches doesn't actually check for base pair matches – it just checks that the characters match.

If you're interested, here's a fixed version of the code:

As a side thought, consider the function that returns whether 2 chars represent a valid ATCG base pair match. Can you think of a more concise or clear way to implement that functionality?

int getMatches (string s1, string s2);
bool baseMatch (char c1, char c2);

/**
 * @brief bestIndexInSequence - returns the index within
 * SEQUENCE with the most base-pair matches
 * @param sequence - a string sequence of ATCG bases.
 * @param sample - a smaller sequence of ATCG bases that
 * we are comparing against subsequences of SEQUENCE to
 * find the index that results in the most matches.
 * @return - the integer value signifying the index with
 * the most base pair matches.
 *
 *
 * As a reference, A matches with T, C matches with G
 */
int bestIndexInSequence (string sequence, string sample) {
    int bestMatches = -1;
    int bestIndex = 0;
    int maxIndex = sequence.length() - sample.length();
    for (int i = 0; i < maxIndex; i++) {
        int numMatches = getMatches(sequence.substr(i, sample.length()), sample);
        if (numMatches >= bestMatches) {
            bestMatches = numMatches;
            bestIndex = i;
        }
    }
    return bestIndex;
}

/**
 * @brief getMatches - returns the number of ATCG matches between s1 and s2.
 * Assumes s1 and s2 are the same length.
 * @param s1 - the first string, representing a portion of SEQUENCE.
 * @param s2 - the second string, equivalent to SAMPLE.
 * @return - the integer value signifying how many AT or CG pairs exist between
 * the two strings.
 */
int getMatches (string s1, string s2) {
    int numMatches = 0;
    for (int i = 0; i < s1.length(); i++) {
        if (baseMatch(s1[i], s2[i])) numMatches++;
    }

    return numMatches;
}

/**
 * @brief baseMatch - returns true if c1 and c2 consist of a proper
 * nucleotide match, namely, 'A' matching with 'T', and 'C' matching with 'G'.
 * @param c1
 * @param c2
 * @return - a boolean signifying the validity of the match.
 */
bool baseMatch (char c1, char c2) {
    switch (c1) {
        case 'A':
            return c2 == 'T';
        case 'T':
            return c2 == 'A';
        case 'C':
            return c2 == 'G';
        case 'G':
            return c2 == 'C';
        default:
            return false;
    }
}

4) It's All About Style!

Topic: Style

Thanks to Lauren Saue-Fletcher for this problem!

Style is a really important part of programming. Writing elegant code is easier for others to read - including your future self at 11 pm the night before the assignment is due (please start assignments early :))). It makes your code easier to change and make additions to, and it also aids code functionality. When you write code with good style, it's easier and quicker to spot bugs. Once you're in the habit of focusing on style while coding, you'll also write code with fewer bugs! We also grade assignments on style.

Now assume that you are now a section leader for CS106B. Here is a submission from one of your students - what feedback would you give?

A reminder- style isn't about writing fewer lines of code. Style is about making code more readable and clear. That said, writing code with good style will often be shorter! See the style guide for some tips! There are many style errors below - try to spot and fix as many as you can!

a)

// DO NOT USE THIS CODE AS AN EXAMPLE OF GOOD STYLE!!!
char LETTER_A = 'A';
/**
Takes in a string s and returns whether
it contains an A
**/
bool containsA(string S) {
    bool sawAnA = false;
    for (char z: s) {
        if (z == LETTER_A) sawAnA = true;
    }
    if (sawAnA) {
        return true;
    }
}

b)

// ***************************
// WARNING: STYLE BUGS AHEAD !!!!
// ***************************

const char LETTER_A = 'A';

int countA(string s) {
    if (s == "") {
        return 0;
    }

    int count_numA = 0;

    int i = 0;
    while (i < s.length()) 
    {
        if (s[i] == LETTER_A) {
            count_numA++;
            i++;
        } 
        if (s[i] != LETTER_A) {
            i++;
    }}

    return count_numA;
}

Solution

Problem A

There's a lot to unpack here! After one pass we might improve the above code to be the following (changes marked by inline comments)

// DO NOT USE ALL OF THIS CODE AS AN EXAMPLE OF GOOD STYLE!!!
const char LETTER_A = 'A';          // Changed to const

/**
Takes in a string s and returns whether
it contains an A
**/
bool containsA(string s) {          // non-const var names should be lower case.
    bool sawAnA = false;
    for (char curr: s) {              // more descriptive variable name: curr.
        if (curr == LETTER_A) {
            sawAnA = true;
        }
    }

    return sawAnA;                  // directly return the boolean.
}

This returning paradigm ends up being a really common pattern in CS - arguably the most elegant approach avoids a boolean variable altogether! While boolean variables aren’t inherently bad, they often hide a better approach:

// USE ME AS AN EXAMPLE OF GOOD STYLE!

char LETTER_A = 'A';

/**
Takes in a string s and returns whether
it contains an A
**/
bool containsA (string s) {
    for (char c: s) {
        if (c == LETTER_A) {
            return true;
        }
    }
    return false;
}

A side benefit of this approach is that we also return the moment we know we’ve seen an A! As a note, this logic often has to be in it’s own function for the return true/false to work out - so decompose this logic out into a helper function when needed.

Finally a note that the stanford c++ library actually contains a fucntion that checks to see if another string contains some given substring. So technically, it would be the best style to just use that! Check out stanford documentation under the “resources” tab on the website.

Problem B

On our first pass we can correct a lot of small style errors.

// ***************************
// WARNING: STYLE BUGS AHEAD !!!!
// ***************************

const char LETTER_A = 'A';

/**
Returns the number of A's in the passed in string       // Add a function comment
**/
int countA(string s) {
    if (s == "") {
        return 0;
    }

    int countNumA = 0;      // be consistent with lowerCamelCase

    int i = 0;
    while (i < s.length()) {  // be consistent with bracket location
        if (s[i] == LETTER_A) {
            countNumA++;
            i++;
        } else {  // use if/else if/else pattern whenever applicable                             
            i++;
        }
    }             // keep bracketing clean

    return countNumA;
}

It turns out, we can do even better!

// ***************************
// WARNING: STYLE BUGS AHEAD !!!!
// ***************************
const char LETTER_A = 'A';

/**
Returns the number of A's in the passed in string       
**/
int countA(string s) {
    if (s == "") {
        return 0;
    }

    int countNumA = 0;                             

    int i = 0;
    while (i < s.length()) {                      
        if (s[i] == LETTER_A) {
            countNumA++;
        } 
        i++;                        // factor out duplicate code
    }                                              

    return countNumA;
}

This is the final product!

----
This is the final product!

const char LETTER_A = 'A';

/**
Returns the number of A's in the passed in string       
**/
int countA(string s) {
    if (s == "") {
        return 0;
    }

    int countNumA = 0;                             

    for (int i = 0; i < s.length(); i++) {      // use a for loop
        if (s[i] == LETTER_A) {
            countNumA++;
        } 
    }                                              

    return countNumA;
}

Section 1. C++ fundamentals

Update: Due to the power outage, Section 1 has been cancelled.

These materials are provided for you to work through. While they are not required, we strongly recommend looking through these problems and posting on Ed, asking your SL, or coming to Office Hours / LaIR if you have any questions!

1) Returning and Printing

2) Pig-Latin (piglatin.cpp)

3) DNA Detectives (bestindex.cpp)

4) It's All About Style!

a)

b)

2) Pig-Latin (`piglatin.cpp`)