Lab 2 Extra Problem

Go back to this week's lab writeup here.

This optional problem is an opportunity to further exercise your understanding of strings.

Code Study: atoi

(takes about 20-30min)

In assign0, you used the atoi function to convert a string argument to an integer. The function name comes from ascii to integer. As you found out in assign0, atoi is convenient to use, but not particularly robust. It has largely been superseded by strtol (used in assign1), which provides more features, but is more complicated.

Below is an implementation of atoi based on the implementation from musl. This code is included in the atoi.c program so you can compile and run it. Note that not all stylistic decisions below are ones you should emulate - we use this code mostly as a case study of how to understand core implementations.

 1    int musl_atoi(const char *s) {
 2        while (isspace(*s)) {
 3            s++;
 4        }
 5
 6        bool negative = false;
 7        switch (*s) {
 8            case '-':
 9                negative = true;
10            case '+':
11                s++;
12        }
13   
14        /* Compute as a negative number to avoid overflow on INT_MIN */
15        int number = 0;
16        while (isdigit(*s)) {
17            number = 10 * number - (*(s++) - '0');
18        }
19    
20        return negative ? number : -number;
21    }

Let's trace through the function. Here are some questions to guide you:

  • Q: [Line 2] What does the expression *s evaluate to? What is another equivalent way to write this expression? (Perhaps this alternate way would be preferred for readability, but alas...)
  • Q: [Line 3] What does the expressions++ do to s?
  • [Lines 6-12] If the string begins with a leading minus, at what point does it advance s past that char? This is subtle! (The fallthrough behavior of a switch statement can be surprising and often triggered by accident, so a comment noting there is intentionally no break statement would probably have been a good idea, but alas...)
  • The code has two loops that advance along the string, but neither has an explicit test to stop at the null terminator. Q: Why is this not necessary?
  • Q: How is a single digit character converted to its numeric value? How is that value combined with the number being built up so far?
  • [Line 17] One quirky note about s++ is that the expression not only modifies s, but it also evaluates to the original value of s (Check out this link for more about this quirky syntax). In other words, s++ will modify s but then evaluate to s rather than the incremented value. Q: How is this used to make the string processing more concise?
  • [Lines 15-20] The loop builds up n as a negative value and later negates it. The comment indicates this choice avoids overflow on INT_MIN. Q: Why does INT_MIN seem to necessitate its own special case? Why does atoi construct a numeric value as a negative number instead of a positive number?
  • Q: What happens when the input string contains a non-digit char? What if that non-digit char is the very first character?
  • [Line 20] Note that the ternary expression (with the question mark and colon) is evaluated using the form "CONDITION ? IF_TRUE : IF_FALSE". Q: Try applying this to interpret the return statement above.

Compile the included atoi program and do some testing. Identify an input of interest and first manually trace how the function will process it, then step through the program in gdb to verify your understanding is correct. Try some inputs that should successfully convert, e.g. ./atoi 107 0 -12, then consider inputs that are problematic or malformed, e.g. ./atoi 3.14 binky @55 --8. What about an input that is outside the representable range of integer? If the string is a number larger than INT_MAX, will it overflow, saturate, return 0, raise an error, or do something else entirely?