Assignment 2: C Strings

Due: Wed Oct 19 11:59 pm
Late submissions accepted until Fri Oct 21 11:59 pm

Assignment by Julie Zelenski, with modifications by Nick Troccoli, Katie Creel, Brynne Hurst and Jonathan Kula

Learning Goals

This assignment covers topics in recent string lectures and the second lab. You will be building your skills with:

  • C-strings (both raw manipulation and using string library functions)
  • viewing Unix utility programs from an internal perspective - as an implementer, not just a client
  • exposure to programmatic access of the filesystem and shell environment variables
  • thoroughly documenting your code, and learning about the importance of good documentation

You shouldn't need material related to heap allocation for this assignment - in other words, this assignment focuses just on the material related to strings and pointers.

Overview

For this assignment, you will write programs that replicate some of the functionality of the Unix commands printenv and which. This is an especially appropriate way to learn more about C and Unix; implementing the Unix operating system and its command-line tools were what motivated the creation of the C language in the first place! Implementing these programs is a very natural use of C, and you'll see how comfortably it fits in this role. Moreover, when we interact with the filesystem programmatically in C, as we will do on part of this assignment, we can use a C string to represent a path (like /a/b/c) and can construct and dissect paths by string manipulation and string.h library functions. Working with paths is thus practice with C strings!

This assignment asks you to complete two functions and one program. Each part gives practice with string manipulation:

  • get_env_value has you extract a specific value from a list of strings
  • scan_token has you implement and document an improved version of strtok from lab2
  • mywhich has you use your get_env_value and scan_token functions to print out the location of an executable on the filesystem

A few reminders:

  • The working on assignments page contains info about the assignment process.
  • The collaboration policy page outlines permitted assignment collaboration, emphasizing that you are to do your own independent thinking, design, coding, and debugging. If you are having trouble completing the assignment on your own, please reach out to the course staff; we are here to help!

To get started on the assignment, clone the starter project using the command

git clone /afs/ir/class/cs107/repos/assign2/$USER assign2

The starter project contains the following:

  • readme.txt: a text file where you will answer questions for the assignment
  • util.c mywhich.c and Makefile: two files that you will modify, and their Makefile for compiling
  • custom_tests: the file where you will add custom tests for your programs
  • myprintenv.c: a program that calls your get_env_value function in util.c for testing purposes. You do not need to modify this file.
  • tokenize.c: a program that calls your scan_token function in util.c for testing purposes. You do not need to modify this file.
  • samples: a symbolic link to the shared directory for this assignment. It contains:
    • SANITY.ini, sanity.py and prototypes.h: files to configure and run Sanity Check. You can ignore these.
    • myprintenv_soln, mywhich_soln and tokenize_soln: executable solutions for the programs you will write.
  • tools: contains symbolic links to the codecheck, sanitycheck, and submit programs for basic style-checking, testing, and submitting your work.

Assignment Support: Through TA helper hours and the discussion forum, our focus will be on supporting you so that you can track down your own bugs. Please ask us how to best use tools (like GDB and the brand-new Valgrind!), what strategies to consider, and advice about how to improve your debugging process or track down your bug. We're happy to help you with these in order to help you drive your own debugging. For this reason, if you have debugging questions during helper hours, please make sure to gather information and explore the issue on your own first, and fill out the QueueStatus questions with this information. For instance, if your program is failing in certain cases, try to find a small replicable test case where the program fails; then, try using GDB to narrow down to what function/code block/etc. you think is causing the issue; then, further investigate just that area to try and find the bug. As another example, if your program is crashing, take a similar approach, but try using GDB and the backtrace command (check it out in lab!) to narrow in on the source of the crash. This information is important for the course staff to effectively help you with debugging. Starting with a future assignment, we will require this information when signing up for helper hours for debugging help, so please make sure to provide as much information as possible.

Working With Strings

For this assignment, use of getenv/secure_getenv and strtok/strsep is prohibited since you are writing your own versions of those functions, but the rest of the standard library is at your disposal and its use is strongly preferred over re-implementing its functionality. The functions in the standard library are already written, tested, debugged, and highly optimized. What's not to like? One important consideration, though, is to choose the appropriate function to use. As one example, there are several different functions that do variants of string compare/search (strstr, strchr, strncmp, strspn and so on). While working on this assignment, be sure to choose the approach that most directly accomplishes the task at hand.

Something else that appears on this assignment is the const keyword; a const char * means that the characters pointed to by this pointer cannot be changed. It also means that if you create another pointer to point to these same characters, it must also be const; think of const like part of the variable type itself. You can, however, reassign the const char * to point to something else; it is just that you are not able to change the characters at the location to which it points. In other words, const char * (and const char **, const char ***, and so on) mean the characters at the location ultimately being referred to cannot be modified, but any pointer on the way there can be modified. Also, it's usually okay to use a non-const pointer for a const pointer argument or variable (no cast required or recommended) - e.g., for strlen(const char *), though its parameter type is technically const char *, we can pass in non-const char *s without casting. But the inverse (supplying a const pointer where a non-const is expected) will raise a warning from the compiler and is likely to result in problems. Here are some examples:

// cannot modify this char
const char c = 'h';
// cannot modify chars pointed to by str
const char *str = ...
// cannot modify chars pointed to by *strPtr
const char **strPtr = ...


char buf[6];
strcpy(buf, "Hello");
const char *str = buf;

// not allowed
str[0] = 'M';

// allowed!
str = "Mello";

// not allowed
str[1] = 'a';

// allowed!
buf[0] = 'M';

If you get compiler warnings about initialization discards 'const' qualifier from pointer target type, it means that the "const-ness" does not match; make sure you follow the rules above for your variable declarations and preserve const-ness where needed.

Testing

This assignment heavily emphasizes testing. For each of the 2 functions and for the mywhich program you write below, you should also add at least 3 to 5 additional tests of your own (total of at least 10) in the custom_tests file that show thoughtful effort to develop comprehensive test coverage. When you add a test, also document your work by including comments in the custom_tests file that explain why you included each test and how the tests relate to one another. The tests supplied with the default SanityCheck are a start that you should build on, with the goal of finding and fixing any bugs before submitting, just like how a professional developer is responsible for vetting any code through comprehensive testing before adding it to a team repository. We recommend you run tests early and often (and remember, running tests even make a snapshot of your code to guard against editing mishaps!). You can also find suggested testing strategies on the testing page, linked to from the Assignments dropdown.

The best way to approach testing on this assignment is:

  1. Understand the expected program behavior
  2. BEFORE writing code, write some tests that cover various cases you can think of
  3. Write your code
  4. Write more tests to cover additional cases

This is because once you start writing code, you may start to think in terms of how your code works rather than how the code should work, meaning if you omit handling a case in your code, you may also omit covering that case in your testing. Thus, a good strategy is to write some tests before implementing anything, and then as you implement, you can add further tests. Use the tests as a way to gauge your progress and uncover bugs! We provide some testing recommendations in each problem section below.

Background: Unix Filesystem and the Shell Environment

In this assignment, you will write code that interacts with the Unix filesystem and something called shell environment variables. If you need an introduction or refresher on the filesystem, review our Unix guide for tutorials on the tree structure, absolute and relative paths, and various commands to interact with files and directories as a user.

We made a video explaining some of the background information about Unix and the terminal that's necessary for this assignment - make sure to watch it before continuing!

As mentioned in the video above, on a Unix system, programs run in the context of the user's "environment". The environment is a list of key-value pairs that provide information about the terminal session and configure the way processes behave. You have already used the USER environment variable when cloning your assignment repo; USER is set to your SUNet ID when you log into myth. Other variables include PATH (where the system looks for programs to run), HOME (path to your home directory), and SHELL (your command line interpreter).

Explore your environment by trying out the printenv and env commands mentioned in the video, and reading their manual pages. You will be implementing a core part of the printenv program as part of the assignment. As a summary:

  • printenv will show your environment variables. Run printenv with no arguments to see your entire environment. Then try printenv USER SHELL HOME. What is the output from a bad request like printenv BOGUS?
  • env is a command that allows you to temporarily change environment variables. You can execute something like:
env BINKY=1 OTHERARG=2 ./myprogram

and myprogram will be executed in a temporary environment with all of the original environment variables, plus BINKY set to 1 and OTHERARG set to 2. To see this, run printenv, then run env BINKY=1 WINKY=2 printenv. What changes between the two?

You can also use env with GDB; e.g. if you want to debug a program that is run using env, start gdb prefixed with env, and then run as normal - for instance: env USER=otheruser gdb myprogram

Before moving on: make sure you have understood what environment variables are and what the printenv program does. Also make sure you're familiar with how to use the env command; this will be essential for thorough testing!

1. get_env_value

Your first task is to implement the function get_env_value in util.c, with the following signature:

const char *get_env_value(const char *envp[], const char *key);

The get_env_value function takes in two parameters: envp, an array of environment variable strings, and key, the name of a specific variable of interest. Each element in envp is a string of the form "KEY=VALUE, e.g. "USER=someuser", and envp contains a NULL pointer as its last element - we can use this to know when we have reached the end of the array. Your function should search the array for the element corresponding to key and return its value, or return NULL if it was not found in the array. For example, asume envp looks like the following:

["USER=someuser", "VAR1=VALUE1", "VAR2=VALUE2", NULL]

If we called get_env_value with this array as envp and the following values for key, this is what would be returned:

if key is "USER", the function would return "someuser"
if key is "VAR1", the function would return "VALUE1"
if key is "NOTTHERE", the function would return NULL
if key is "VAR", the function would return NULL

Your function must iterate through the envp array in search of a matching entry, and if it finds the matching entry (an exact match with the variable name), should return a pointer to the portion of the string following the '=' character. It should not make a copy of the value string.

For each entry in the envp array, you can assume that neither KEY nor VALUE will contain an =.

Testing

This function can be tested in isolation with the provided myprintenv.c program, which you do not need to modify or comment, but whose code you should read in order to understand how it calls your get_env_value function. myprintenv behaves similarly to printenv in that you can specify one or more environment variable names as command line arguments and it will print out the value of each of those. You can run myprintenv without arguments to print out all environment variables. You can also write sanitycheck custom tests with myprintenv.

We recommend using the env command to help with testing, both manually and in sanitycheck custom tests! For instance, if you execute

env USER=otheruser ./myprintenv USER

this will change the USER environment variable just for executing myprintenv this one time to be otheruser instead of your SUNET ID. You can then ensure that myprintenv prints out the correct value, otheruser.

Note that there is one special environment variable, "_", whose set value will always differ between your solution and the sample solution. If you wish to test looking up the value for "_", use env to set it temporarily to a different value.

Before moving on: make sure you have thoroughly tested your get_env_value function, making sure to cover various different cases of possible inputs, and that you have written your custom tests. You will use this function later in the assignment, so it's vital to ensure it's thoroughly tested before moving on!

2. scan_token

Note: this problem requires material covered in lecture 6. Please make sure to review the lecture 6 material before beginning this problem.

Make sure you have also understood the strtok function as mentioned in the last lab. Understanding that will help significantly as you implement this function!

Your second task is to implement a function scan_token in util.c, with the following signature:

bool scan_token(const char **p_input, const char *delimiters,
                char buf[], size_t buflen);

scan_token is an improved version of strtok from lab2. Such a function to tokenize a string by delimiters is handy to have, but the standard strtok has design flaws that make it difficult to use. The intention of scan_token is to separate a string into tokens in the manner of strtok but with a significantly cleaner design.

scan_token takes in a pointer to a string and the delimiters to use to tokenize it, and puts one token, which is the next token in the string, into the specified buffer buf, and returns true or, if no more tokens are left, it returns false. Note that scan_token's first parameter is a double pointer, a pointer to a char *. This is necessary because scan_token needs to change the char * itself to advance past characters that it has previously scanned. The caller will thus have to call it several times to tokenize the entire string. Here is an example:

const char *input = "super-duper-awesome-magnificent";
char buf[10];
const char *remaining = input;

while (scan_token(&remaining, "-", buf, sizeof(buf))) {
    printf("Next token: %s\n", buf);
}
// once we get here, `remaining` is the empty string

Running the above code produces this output:

Next token: super
Next token: duper
Next token: awesome
Next token: magnifice
Next token: nt

The function should be implemented as follows, using appropriate string.h functions (see our standard library guide) - the first two steps borrow from how strtok is implemented:

  1. scan the input string to find the first character not contained in delimiters. This marks the beginning of the next token.
  2. scan from that point to find the first character contained in delimiters. This delimiter (or the end of the string if no delimiter was found) marks the end of the token.
  3. write this token as a valid C string to buf, which has space for buflen characters. scan_token should not write past the end of buf.
    • If a token does not fit in buf, the function should write buflen - 1 characters into buf and write a null terminator in the last slot.
  4. update the pointer pointed to by p_input to point to the next character in the input that follows what was just scanned.
    • If the scanned token consumed all of the remaining input, *p_input should point to the input's null terminator.
    • If the scanned token was too big to fit entirely in buf, then *p_input should point to the character in the input immediately after the buflen - 1 characters that fit in buf. In other words, the next token scanned will start at the first character that would have overflowed buf.
  5. return true if a token was written to buf, and false otherwise.

scan_token should not emulate the bad parts of strtok's design. Specifically, it should not use static or global variables and should not modify the input string's characters.

You may assume the following about the parameters to scan_token:

  • buf is always a valid address to a region of memory that has space for buflen characters
  • buflen is always greater than 1
  • p_input is always a valid pointer to a pointer
  • *p_input is always a well-formed (e.g. null-terminated) C-string. (may be empty string)
  • delimiters is always a well-formed C-string containing one or more delimiter chars. (i.e. it will never be the empty string)

Note that even if you wish to add checking for some of these assumptions, e.g. determining whether p_input is valid, or that buf actually has buflen characters of space, it's tough to do. Determining whether a pointer is valid, for instance, is not solvable in general, and any measure to detect bad pointers will be half-hearted at best. As the implementer, at times you have little choice but to clearly document your assumptions and assume the client will adhere to them, and write your code accordingly.

Testing

This function can be tested in isolation with the provided tokenize.c program, which you do not need to modify or comment, but whose code you should read over to understand how it calls your scan_token function. You can also write sanitycheck custom tests with tokenize.

If you execute ./tokenize, it will use your scan_token function to calculate the number of syllables of various test words. You can also run it by specifying other text you would like to use to test, in this format:

./tokenize <DELIMITERS> <TEXT> <BUFSIZE (OPTIONAL)>

For example, if you would like to tokenize the text "hello I am a C-string" using the delimiters "-" and " ", you could run:

./tokenize " -" "hello I am a C-string"

The first string contains the characters to use as delimiters, and the second string is the text to tokenize. This command should output something like:

./tokenize " -" "hello I am a C-string"
Tokenized: { "hello" "I" "am" "a" "C" "string" }
remaining:

You may optionally specify a third argument which is the size of the buffer to pass when tokenizing. If you do not include this command line argument, the buffer is sized to always have enough space to store the entire token.

Before moving on: make sure you have thoroughly tested your scan_token function, making sure to cover various different cases of possible inputs, and that you have written your custom tests. You will use this function later in the assignment, so it's vital to ensure it's thoroughly tested before moving on!

3. Documenting scan_token

When functions have assumptions, limitations or flaws, it is vital that the documentation makes those clear. Otherwise, developers don’t have the information they need to make good decisions when writing their programs. For example, one of the design flaws of strtok is that it modifies the characters of its first argument. Luckily, this is documented in the BUGS section of the man page (though it should perhaps be emphasized more than just as a minor detail). If we were unaware of this flaw, we might assume the argument wasn't modified, breaking other parts of our program or even introducing potential vulnerabilities.

For this next part of the assignment, your task is to write a "manual page" for your scan_token function. Function documentation like this is different than comments in your actual program code. While header, inline and other comments should be brief and standalone, a manual page reference is more thorough and cohesive. In manual pages with multiple sections, text at the beginning of a section should explain some of the concepts, and should often make some general points that apply to several functions or variables. Additionally, manual page documentation should be written more formally than code comments. As the GNU standard explains, "the only good way to use [code comments] in writing a good manual is to use them as a source of information for writing good text."

In your readme.txt file, we have provided a template outline for your manual page. Fill in the remaining components to fully document your scan_token function. Here is the starter template, for reference:

scan_token DOCUMENTATION
INSTRUCTIONS: Fill in the sections marked with a TODO below.
Your documentation should be original (i.e., please do not copy and paste from the assignment spec).
NAME
    scan_token - # TODO write a one-sentence description of scan_token
    bool scan_token(const char **p_input, const char *delimiters,
                    char buf[], size_t buflen);
ARGUMENTS
    const char **p_input - #TODO: write one sentence explaining the p_input argument
    const char *delimiters - #TODO: write one sentence explaining the delimiters argument
    char buf[] - #TODO: write one sentence explaining the buf argument
    size_t buflen - #TODO: write one sentence explaining the buflen argument
RETURN VALUE
    #TODO: write a 1-3 sentence description of the possible return values of scan_token.
    Make sure to include a description of what will be stored in the buf argument upon return.
ASSUMPTIONS
    #TODO: write 2-5 sentences explaining the assumptions made by your scan_token function.

    Here is an example: The scan_token function assumes that the buf argument
    has space for buflen characters.
DESCRIPTION
    #TODO: write one paragraph explaining the implementation of your scan_token function.
    This section should include (high-level) implementation details. You can use your function-header
    comment as a starting point for this section.

Tip: when you need to use scan_token later on in your mywhich program, try referring to just the manual page you wrote here. If you find that you need more information in order to effectively use the function, consider adding what might be missing. Your goal for your manual page reference should be that a client can effectively use your function without seeing the code, just like how you can use string functions without seeing their implementations.

4. mywhich

Your final task is to use your scan_token and get_env_value functions to implement the mywhich.c program, which is a simplified version of the Unix which command. It takes the names of executables (e.g. make, cat, emacs, etc.) and prints out their filesystem locations. Read the man page for the Unix version (man which) if you'd like, though note that your mywhich program will differ a bit from the full which behavior. Try out the provided sample solution, e.g. ./samples/mywhich_soln ls or ./samples/mywhich_soln make. For each command name, it prints the full path to the first matching executable it finds or nothing if no matching executable was found. The matched executables are listed one per line in the order that the command names were specified on the command-line. In this example, two of them were found, but no executable named submit was found in any directory in the user's PATH and thus nothing was printed for it.

myth$ ./samples/mywhich_soln emacs submit cp
/usr/bin/emacs
/usr/bin/cp

If no command line arguments are specified, mywhich prints out the directories in the search path, one per line.

This search is intimately related to how commands are executed by the shell. When you run a command such as ls or emacs, the shell searches for an executable program that matches that command name and then runs that program.

Where does it search for executables? You might imagine that it looks for an executable file named emacs in every directory on the entire filesystem, but such an exhaustive search would be both inefficient and insecure. Instead, it looks just directly inside those directories that have been explicitly listed in the user's PATH environment variable. The value for PATH is a sequence of directories separated by colons such as PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/games. When looking for a command, which considers the directories in the order they are listed in PATH and stops at the first one that contains a matching executable. In other words, here mywhich would first see if /usr/local/bin/emacs exists. If it does, mywhich prints it out and stops. If it doesn't, it would then check for /usr/bin/emacs. Then /bin/emacs. And so on. There is a library function called access that will come in handy (more on this later), which can tell you whether a given executable path is valid. Note that this process isn't doing an exhaustive search of all files directly or indirectly contained in a path like /usr/local/bin. It's just seeing if the specified executable name, e.g. emacs, exists directly inside the specified path location, e.g. /usr/local/bin/emacs.

PATH is set by default in your environment to include directories such as /usr/local/bin/ and /usr/bin/ which house the executable files for the standard Unix commands. (The name bin is a nod to the fact that executable files are encoded in binary). For ease of testing, mywhich also supports using the environment variable MYPATH, if it is specified, so that you can customize the path contents without changing your PATH environment variable (which may break other shell functionality). You can specify it with the env command, like this:

myth$ env MYPATH=/tmp:tools ./mywhich submit
tools/submit

There are several core string tasks in this program:

  • Getting the value of the MYPATH or PATH environment variable - the starter code uses your get_env_value function to do this
  • Tokenizing the specified path to get each individual location you need to search - you will use your scan_token function to do this
  • creating the full path that you wish to check - e.g. taking an individual location like /usr/local/bin and an executable name like emacs and constructing a path with the concatenation of the individual location, a forward slash, and the executable name: /usr/local/bin/emacs. Then you can pass that path as a parameter to the access function to check if it's valid.

Starter Code

mywhich.c is given to you with an incomplete main function that handles the case when mywhich is invoked with no arguments. You should first read and understand this code, work out how to change/extend it to suit your needs, and finally add comments to document your strategy.

Note that you can (and are encouraged to!) change code in mywhich as you wish, to decompose it, etc. Your goal should be to have your main function act as a concise summary of your overall program.

Some concepts to think about when looking over the code:

  • When applied to an array, the sizeof operator conveniently returns the actual size of the array. However, as soon as that array is passed as a parameter (it becomes a pointer to the first element) or as soon as we create a pointer to any of its elements, sizeof of that pointer will return 8 bytes instead of the array size because a pointer is 8 bytes. Additionally, note that the array size is not necessarily the same as the string length if it is a string.
  • If the user's environment does not contain a value for MYPATH, what does mywhich use instead?
  • How does a client properly use scan_token? (see sample uses in both tokenize.c and mywhich.c)
  • Do you see anything unexpected or erroneous? We intend for our code to be bug-free; if you find otherwise, please let us know!

The code we provide has been stripped of its comments and it's your job to provide the missing documentation.

Implementation

The program should be implemented as follows:

  1. If there are no command line arguments, the program prints the directories in the search path, one directory per line. This is already implemented for you in the starter code.
  2. If there are command line arguments, the program searches for a matching executable for each argument in the order they were specified, and prints the full path to the first matching executable it finds or nothing if no match was found. To do this, for each argument:
    • Take the specified path (the value for MYPATH, if it exists, or for PATH otherwise) and tokenize it using your scan_token function and a buffer of size PATH_MAX. PATH_MAX is the system's limit on the maximum length of a full path (including the null terminator). for each token (which is a single directory path):
      • use that large buffer to construct the full path: e.g. if the token is /usr/local/bin and the executable name is emacs, you want to construct the path /usr/local/bin/emacs. You may assume the constructed path will fit in the PATH_MAX-sized buffer.
      • use the access function to check if that executable path is valid.
        • If it is, print out that path and move on to processing the next command line argument.
        • If it's not, try searching again with the next token

Note that you should not store all the path tokens in an array while tokenizing - you should perform the searches as you tokenize. For this reason, note that if there are multiple command line arguments, you will repeat the tokenization of the search path for each argument, and that's fine. You may assume that the user's MYPATH / PATH variables are always well-formed sequences of one or more paths separated by colons.

Here's more information about the access function: access is built-in function that is a part of the POSIX standards, which establish a set of C functions for interacting with the operating system. Whereas the standard C library functions provide only simple file reading/writing, the POSIX functions add more comprehensive services, including access to filesystem metadata (e.g. modification time, who can access files), directory contents, and filesystem operations that are necessary for implementing Unix commands like ls and mkdir, which are themselves just executable programs. The function access has the following signature:

int access(const char *pathname, int mode);

It takes in a path, pathname, and permissions, mode, and returns whether or not you have those permissions for the file at that path. To use access to check if an executable path is valid, we will be asking access to check whether we can read and execute the file at that executable path. If we can, it means an executable exists at that location. Otherwise, we assume none exists there.

Therefore, when you call access, the first parameter should be the constructed executable path you wish to check (e.g. /usr/local/bin/emacs), and the second parameter should be a bitmask that is a combination of the bitwise constants R_OK and X_OK (a value with the bits in both of these constants on). In this way, we specify that we want access to check if we have "read" and "execute" permissions for that file.

Be sure to carefully read the man page so you know how to properly interpret the return value from a call to access!

Testing

You can write sanitycheck custom tests with mywhich - we recommend using env and MYPATH to easily specify custom search paths.

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. When you submit, you may optionally indicate that you do not plan to make a submission after the on-time deadline. This allows the staff to start grading some submissions as soon as the on-time deadline passes, rather than waiting until after the late period to start grading.

  • When in doubt, it's fine to indicate that you may make a late submission, even if you end up submitting on time
  • If you do indicate you won't submit late, this means once the on-time deadline passes, you cannot submit again. You can resubmit any time before the on-time deadline, however.
  • If you want to change your decision, you can do so any time before the on-time deadline by resubmitting and changing your answer.
  • If you know that you will not make a late submission, we would appreciate you indicating this so that we can grade assignments more quickly!

You only need to modify the following files for this assignment: util.c, mywhich.c, custom_tests, readme.txt

We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!

Grading

Below is the tentative grading rubric. We use a combination of automated tests and manual review to evaluate your submission. More details are given in our page linked to from the Assignments dropdown explaining how assignments are graded.

Readme (12 points)

Functionality (82 points)

  • Sanity cases (25 points) Correct results on the default sanity check tests.
  • Comprehensive/stress cases (40 points) Correct results for additional test cases with broad, comprehensive coverage and larger, more complex inputs.
  • Clean compile (2 points) Compiles cleanly with no warnings.
  • Clean run under valgrind (10 points) Clean memory report(s) when run under valgrind. Memory errors (invalid read/write, etc) are significant deductions. Every normal execution path is expected to run cleanly with no memory errors nor leaks reported. We will not test exception/error cases under Valgrind.
  • custom_tests (5 points) Your custom_tests file should include at least 10 tests of your own, 3-5 per program, that show thoughtful effort to develop comprehensive testing coverage. Please include comments that explain your choices. We will run your custom tests against your submission as well as review the cases to assess the strength of your testing efforts.

Code Quality (buckets weighted to contribute ~15 points)

The grader's code review is scored into a bucket per assignment part to emphasize the qualitative features of the review over the quantitative. The styleguide is a great overall resource for good program style. Here are some highlights for this assignment:

  • Using library functions where possible. If the C library provides functionality needed for a task, you should leverage these library functions rather than re-implement that functionality.
  • Use of pointers and memory. We expect you to show proficiency in handling pointers/memory, no unnecessary levels of indirection, correct use of pointee types and typecasts, and so on. For this program, you should not need and should not use dynamic memory (i.e. no malloc/free/strdup).
  • Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. When you need the same code in more than one place, you should unify, not copy and paste.
  • Style and readability. We expect your code to be clean and readable. We will look for descriptive names, defined constants (not magic numbers!), and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
  • Documentation. You are to document both the code you wrote and what we provided (except for tokenize.c and myprintenv.c). We expect program overview and per-function comments that explain the overall design along with sparing use of inline comments to draw attention to noteworthy details or shed light on a dense or obscure passage. The audience for the comments is your C-savvy peer.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have written your own implementation of a standard Unix utility program and an improved version of a standard library function, along with comprehensive documentation. That's a pretty darn impressive accomplishment, especially so given only a few weeks of learning about Unix and C -- wow!

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.

  • The string library contains several functions to perform a form of string comparison, e.g. strncmp, strstr, strchr, strspn, ... Explain the differences between the functions and identify a situation in which each is appropriate.
  • Write a C expression that converts a hexadecimal digit char to its numerical value, i.e. '1' => 1, 'f' => 15.
  • The first parameter to the function scan_token is of type const char **. Explain the purpose of the extra level of indirection on that argument.
  • It is controversial (see section 13) whether to add . (the current directory) to your PATH. Why might it be convenient? Why does it introduce a security risk?
  • Why is good function documentation (like manual pages) critical for good software development?