Solutions by Nick Troccoli and Lisa Yan
Go back to this week's extra problem (without solutions) here.
Go back to this week's lab solutions here.
This optional problem is an opportunity to further exercise your understanding of strings.
Code Study: atoi
(takes about 20-30min)
In assign0, you used the atoi function to convert a string argument to an integer. The function name comes from ascii to integer. As you found out in assign0, atoi is convenient to use, but not particularly robust. It has largely been superseded by strtol (used in assign1), which provides more features, but is more complicated.
Below is an implementation of atoi based on the implementation from musl. This code is included in the atoi.c program so you can compile and run it. Note that not all stylistic decisions below are ones you should emulate - we use this code mostly as a case study of how to understand core implementations.
1 int musl_atoi(const char *s) {
2 while (isspace(*s)) {
3 s++;
4 }
5
6 bool negative = false;
7 switch (*s) {
8 case '-':
9 negative = true;
10 case '+':
11 s++;
12 }
13
14 /* Compute as a negative number to avoid overflow on INT_MIN */
15 int number = 0;
16 while (isdigit(*s)) {
17 number = 10 * number - (*(s++) - '0');
18 }
19
20 return negative ? number : -number;
21 }
Let's trace through the function. Here are some questions to guide you:
- Q: [Line 2] What does the expression
*sevaluate to? What is another equivalent way to write this expression? (Perhaps this alternate way would be preferred for readability, but alas...)Dereferencing a string (i.e.,
*sforchar *s) is identical to getting the first character in a string with array syntax,s[0]. - Q: [Line 3] What does the expression
s++do tos?s++advances thespointer by one character. Together, Lines 2 and 3 advance thescharacter past any space characters at the start of the string. - [Lines 6-12] If the string begins with a leading minus, at what point does it advance
spast that char? This is subtle! (The fallthrough behavior of a switch statement can be surprising and often triggered by accident, so a comment noting there is intentionally no break statement would probably have been a good idea, but alas...)If it finds a "-" sign, it updates the boolean to reflect this, and also increments s to advance past that character. This is because switch statement cases "fall through" by default, unless they hit a
breakstatement, and continue executing the case code below them. If it finds a "+" sign, it just increments s to advance past that character. - The code has two loops that advance along the string, but neither has an explicit test to stop at the null terminator. Q: Why is this not necessary?
The null terminator character
'\0'is neither a space nor a digit. If*swere the null terminator, both loop conditionals would evaluate to false. - Q: How is a single digit character converted to its numeric value? How is that value combined with the number being built up so far?
After initializing the numeric value to zero, it loops over each character in the remaining string as long as the character is numeric; as soon as it finds a non-digit, the function stops and returns whatever value it has constructed so far. It constructs the number one digit at a time; it sets
number's new value to be its previous value times 10, minus the new digit. - [Line 17] One quirky note about
s++is that the expression not only modifiess, but it also evaluates to the original value of s (Check out this link for more about this quirky syntax). In other words,s++will modifysbut then evaluate tosrather than the incremented value. Q: How is this used to make the string processing more concise?In one line, we can get the current character pointed to by
sand update the pointer for the next iteration of the loop. - [Lines 15-20] The loop builds up
nas a negative value and later negates it. The comment indicates this choice avoids overflow onINT_MIN. Q: Why doesINT_MINseem to necessitate its own special case? Why does atoi construct a numeric value as a negative number instead of a positive number?In the
intdatatype, there is one more representable negative number (INT_MIN) than positive number; if the function constructs all negative numbers, then all positive numbers have a representable negative equivalent and Line 20 handles the sign flip. - Q: What happens when the input string contains a non-digit char? What if that non-digit char is the very first character?
If it's a space,
atoiwill move to the next character. If it's any other character,numberis initialized at 0, and soatoiwill return 0. - [Line 20] Note that the ternary expression (with the question mark and colon) is evaluated using the form "CONDITION ? IF_TRUE : IF_FALSE". Q: Try applying this to interpret the return statement above.
If the number is negative, return the currently constructed number; otherwise, return the negative of the currently constructed number (so the result would necessarily be positive, based on lines 15-18).
Compile the included atoi program and do some testing. Identify an input of interest and first manually trace how the function will process it, then step through the program in gdb to verify your understanding is correct. Try some inputs that should successfully convert, e.g. ./atoi 107 0 -12, then consider inputs that are problematic or malformed, e.g. ./atoi 3.14 binky @55 --8. What about an input that is outside the representable range of integer? If the string is a number larger than INT_MAX, will it overflow, saturate, return 0, raise an error, or do something else entirely?