Due: Thu Feb 2 11:59 pm
Late submissions accepted until Sat Feb 4 11:59 pm
Assignment by Julie Zelenski, with modifications by Nick Troccoli, Katie Creel, Brynne Hurst and Jonathan Kula
Learning Goals
This assignment covers topics in recent string lectures and the second lab. You will be building your skills with:
- C-strings (both raw manipulation and using string library functions)
- viewing Unix utility programs from an internal perspective - as an implementer, not just a client
- exposure to programmatic access of the filesystem and shell environment variables
- thoroughly documenting your code, and learning about the importance of good documentation
You shouldn't need material related to heap allocation for this assignment - in other words, this assignment focuses just on the material related to strings and pointers.
Overview
For this assignment, you will write programs that replicate some of the functionality of the Unix commands printenv
and which
. This is an especially appropriate way to learn more about C and Unix; implementing the Unix operating system and its command-line tools were what motivated the creation of the C language in the first place! Implementing these programs is a very natural use of C, and you'll see how comfortably it fits in this role. Moreover, when we interact with the filesystem programmatically in C, as we will do on part of this assignment, we can use a C string to represent a path (like /a/b/c
) and can construct and dissect paths by string manipulation and string.h
library functions. Working with paths is thus practice with C strings!
This assignment asks you to complete two functions and one program. Each part gives practice with string manipulation:
get_env_value
has you extract a specific value from a list of stringsscan_token
has you implement and document an improved version ofstrtok
from lab2mywhich
has you use yourget_env_value
andscan_token
functions to print out the location of an executable on the filesystem
A few reminders:
- The working on assignments page contains info about the assignment process.
- The collaboration policy page outlines permitted assignment collaboration, emphasizing that you are to do your own independent thinking, design, coding, and debugging. If you are having trouble completing the assignment on your own, please reach out to the course staff; we are here to help!
To get started on the assignment, clone the starter project using the command
git clone /afs/ir/class/cs107/repos/assign2/$USER assign2
The starter project contains the following:
readme.txt
: a text file where you will answer questions for the assignmentutil.c
mywhich.c
andMakefile
: two files that you will modify, and their Makefile for compilingcustom_tests
: the file where you will add custom tests for your programsmyprintenv.c
: a program that calls yourget_env_value
function inutil.c
for testing purposes. You do not need to modify this file.tokenize.c
: a program that calls yourscan_token
function inutil.c
for testing purposes. You do not need to modify this file.samples
: a symbolic link to the shared directory for this assignment. It contains:SANITY.ini
,sanity.py
andprototypes.h
: files to configure and run Sanity Check. You can ignore these.myprintenv_soln
,mywhich_soln
andtokenize_soln
: executable solutions for the programs you will write.
tools
: contains symbolic links to thecodecheck
,sanitycheck
, andsubmit
programs for basic style-checking, testing, and submitting your work.
Assignment Support: Through TA helper hours and the discussion forum, our focus will be on supporting you so that you can track down your own bugs. Please ask us how to best use tools (like GDB and the brand-new Valgrind!), what strategies to consider, and advice about how to improve your debugging process or track down your bug. We're happy to help you with these in order to help you drive your own debugging. For this reason, if you have debugging questions during helper hours, please make sure to gather information and explore the issue on your own first, and fill out the QueueStatus questions with this information. For instance, if your program is failing in certain cases, try to find a small replicable test case where the program fails; then, try using GDB to narrow down to what function/code block/etc. you think is causing the issue; then, further investigate just that area to try and find the bug. As another example, if your program is crashing, take a similar approach, but try using GDB and the backtrace
command (check it out in lab!) to narrow in on the source of the crash. This information is important for the course staff to effectively help you with debugging. Starting with a future assignment, we will require this information when signing up for helper hours for debugging help, so please make sure to provide as much information as possible.
Working With Strings
For this assignment, use of getenv
/secure_getenv
and strtok
/strsep
is prohibited since you are writing your own versions of those functions, but the rest of the standard library is at your disposal and its use is strongly preferred over re-implementing its functionality. The functions in the standard library are already written, tested, debugged, and highly optimized. What's not to like? One important consideration, though, is to choose the appropriate function to use. As one example, there are several different functions that do variants of string compare/search (strstr
, strchr
, strncmp
, strspn
and so on). While working on this assignment, be sure to choose the approach that most directly accomplishes the task at hand.
Something else that appears on this assignment is the const
keyword; a const char *
means that the characters pointed to by this pointer cannot be changed. It also means that if you create another pointer to point to these same characters, it must also be const
; think of const
like part of the variable type itself. You can, however, reassign the const char *
to point to something else; it is just that you are not able to change the characters at the location to which it points. In other words, const char *
(and const char **
, const char ***
, and so on) mean the characters at the location ultimately being referred to cannot be modified, but any pointer on the way there can be modified. Also, it's usually okay to use a non-const pointer for a const pointer argument or variable (no cast required or recommended) - e.g., for strlen(const char *)
, though its parameter type is technically const char *
, we can pass in non-const char *
s without casting. But the inverse (supplying a const pointer where a non-const is expected) will raise a warning from the compiler and is likely to result in problems. Here are some examples:
// cannot modify this char
const char c = 'h';
// cannot modify chars pointed to by str
const char *str = ...
// cannot modify chars pointed to by *strPtr
const char **strPtr = ...
char buf[6];
strcpy(buf, "Hello");
const char *str = buf;
// not allowed
str[0] = 'M';
// allowed!
str = "Mello";
// not allowed
str[1] = 'a';
// allowed!
buf[0] = 'M';
If you get compiler warnings about initialization discards 'const' qualifier from pointer target type
, it means that the "const-ness" does not match; make sure you follow the rules above for your variable declarations and preserve const-ness where needed.
Testing
This assignment heavily emphasizes testing. For each of the 2 functions and for the mywhich
program you write below, you should also add at least 3 to 5 additional tests of your own (total of at least 10) in the custom_tests
file that show thoughtful effort to develop comprehensive test coverage. When you add a test, also document your work by including comments in the custom_tests
file that explain why you included each test and how the tests relate to one another. The tests supplied with the default SanityCheck are a start that you should build on, with the goal of finding and fixing any bugs before submitting, just like how a professional developer is responsible for vetting any code through comprehensive testing before adding it to a team repository. We recommend you run tests early and often (and remember, running tests even make a snapshot of your code to guard against editing mishaps!). You can also find suggested testing strategies on the testing page, linked to from the Assignments dropdown.
The best way to approach testing on this assignment is:
- Understand the expected program behavior
- BEFORE writing code, write some tests that cover various cases you can think of
- Write your code
- Write more tests to cover additional cases
This is because once you start writing code, you may start to think in terms of how your code works rather than how the code should work, meaning if you omit handling a case in your code, you may also omit covering that case in your testing. Thus, a good strategy is to write some tests before implementing anything, and then as you implement, you can add further tests. Use the tests as a way to gauge your progress and uncover bugs! We provide some testing recommendations in each problem section below.
Background: Unix Filesystem and the Shell Environment
In this assignment, you will write code that interacts with the Unix filesystem and something called shell environment variables. If you need an introduction or refresher on the filesystem, review our Unix guide for tutorials on the tree structure, absolute and relative paths, and various commands to interact with files and directories as a user.
We made a video explaining some of the background information about Unix and the terminal that's necessary for this assignment - make sure to watch it before continuing!
As mentioned in the video above, on a Unix system, programs run in the context of the user's "environment". The environment is a list of key-value pairs that provide information about the terminal session and configure the way processes behave. You have already used the USER
environment variable when cloning your assignment repo; USER
is set to your SUNet ID when you log into myth. Other variables include PATH
(where the system looks for programs to run), HOME
(path to your home directory), and SHELL
(your command line interpreter).
Explore your environment by trying out the printenv
and env
commands mentioned in the video, and reading their manual pages. You will be implementing a core part of the printenv
program as part of the assignment. As a summary:
printenv
will show your environment variables. Runprintenv
with no arguments to see your entire environment. Then tryprintenv USER SHELL HOME
. What is the output from a bad request likeprintenv BOGUS
?env
is a command that allows you to temporarily change environment variables. You can execute something like:
env BINKY=1 OTHERARG=2 ./myprogram
and myprogram
will be executed in a temporary environment with all of the original environment variables, plus BINKY
set to 1
and OTHERARG
set to 2
. To see this, run printenv
, then run env BINKY=1 WINKY=2 printenv
. What changes between the two?
You can also use env
with GDB; e.g. if you want to debug a program that is run using env
, start gdb
prefixed with env
, and then run as normal - for instance: env USER=otheruser gdb myprogram
Before moving on: make sure you have understood what environment variables are and what the printenv
program does. Also make sure you're familiar with how to use the env
command; this will be essential for thorough testing!
1. get_env_value
Your first task is to implement the function get_env_value
in util.c
, with the following signature:
const char *get_env_value(const char *envp[], const char *key);
The get_env_value
function takes in two parameters: envp
, an array of environment variable strings, and key
, the name of a specific variable of interest. Each element in envp
is a string of the form "KEY=VALUE
, e.g. "USER=someuser"
, and envp
contains a NULL
pointer as its last element - we can use this to know when we have reached the end of the array. Your function should search the array for the element corresponding to key
and return its value, or return NULL
if it was not found in the array. For example, asume envp
looks like the following:
["USER=someuser", "VAR1=VALUE1", "VAR2=VALUE2", NULL]
If we called get_env_value
with this array as envp
and the following values for key
, this is what would be returned:
if key is "USER", the function would return "someuser"
if key is "VAR1", the function would return "VALUE1"
if key is "NOTTHERE", the function would return NULL
if key is "VAR", the function would return NULL
Your function must iterate through the envp
array in search of a matching entry, and if it finds the matching entry (an exact match with the variable name), should return a pointer to the portion of the string following the '=' character. It should not make a copy of the value string.
For each entry in the envp
array, you can assume that neither KEY
nor VALUE
will contain an =
.
Testing
This function can be tested in isolation with the provided myprintenv.c
program, which you do not need to modify or comment, but whose code you should read in order to understand how it calls your get_env_value
function. myprintenv
behaves similarly to printenv
in that you can specify one or more environment variable names as command line arguments and it will print out the value of each of those. You can run myprintenv
without arguments to print out all environment variables. You can also write sanitycheck custom tests with myprintenv
.
We recommend using the env
command to help with testing, both manually and in sanitycheck custom tests! For instance, if you execute
env USER=otheruser ./myprintenv USER
this will change the USER
environment variable just for executing myprintenv
this one time to be otheruser
instead of your SUNET ID. You can then ensure that myprintenv
prints out the correct value, otheruser
.
Note that there is one special environment variable, "_", whose set value will always differ between your solution and the sample solution. If you wish to test looking up the value for "_", use env
to set it temporarily to a different value.
Before moving on: make sure you have thoroughly tested your get_env_value
function, making sure to cover various different cases of possible inputs, and that you have written your custom tests. You will use this function later in the assignment, so it's vital to ensure it's thoroughly tested before moving on!
2. scan_token
Note: this problem requires material covered in lecture 5. Please make sure to review the lecture 5 material before beginning this problem.
Make sure you have also understood the strtok
function as mentioned in the last lab. Understanding that will help significantly as you implement this function!
Your second task is to implement a function scan_token
in util.c
, with the following signature:
bool scan_token(const char **p_input, const char *delimiters,
char buf[], size_t buflen);
scan_token
is an improved version of strtok
from lab2. Such a function to tokenize a string by delimiters is handy to have, but the standard strtok
has design flaws that make it difficult to use. The intention of scan_token
is to separate a string into tokens in the manner of strtok
but with a significantly cleaner design.
scan_token
takes in a pointer to a string and the delimiters to use to tokenize it, and puts one token, which is the next token in the string, into the specified buffer buf
, and returns true
or, if no more tokens are left, it returns false
. Note that scan_token
's first parameter is a double pointer, a pointer to a char *
. This is necessary because scan_token
needs to change the char *
itself to advance past characters that it has previously scanned. The caller will thus have to call it several times to tokenize the entire string. Here is an example:
const char *input = "super-duper-awesome-magnificent";
char buf[10];
const char *remaining = input;
while (scan_token(&remaining, "-", buf, sizeof(buf))) {
printf("Next token: %s\n", buf);
}
// once we get here, `remaining` is the empty string
Running the above code produces this output:
Next token: super
Next token: duper
Next token: awesome
Next token: magnifice
Next token: nt
The function should be implemented as follows, using appropriate string.h
functions (see our standard library guide) - the first two steps borrow from how strtok
is implemented:
- scan the input string to find the first character not contained in
delimiters
. This marks the beginning of the next token. - scan from that point to find the first character contained in
delimiters
. This delimiter (or the end of the string if no delimiter was found) marks the end of the token. - write this token as a valid C string to
buf
, which has space forbuflen
characters.scan_token
should not write past the end ofbuf
.- If a token does not fit in
buf
, the function should writebuflen - 1
characters intobuf
and write a null terminator in the last slot.
- If a token does not fit in
- update the pointer pointed to by
p_input
to point to the next character in the input that follows what was just scanned.- If the scanned token consumed all of the remaining input,
*p_input
should point to the input's null terminator. - If the scanned token was too big to fit entirely in
buf
, then*p_input
should point to the character in the input immediately after thebuflen - 1
characters that fit inbuf
. In other words, the next token scanned will start at the first character that would have overflowedbuf
.
- If the scanned token consumed all of the remaining input,
- return
true
if a token was written tobuf
, andfalse
otherwise.
scan_token
should not emulate the bad parts of strtok
's design. Specifically, it should not use static or global variables and should not modify the input string's characters.
You may assume the following about the parameters to scan_token
:
buf
is always a valid address to a region of memory that has space forbuflen
charactersbuflen
is always greater than 1p_input
is always a valid pointer to a pointer*p_input
is always a well-formed (e.g. null-terminated) C-string. (may be empty string)delimiters
is always a well-formed C-string containing one or more delimiter chars. (i.e. it will never be the empty string)
Note that even if you wish to add checking for some of these assumptions, e.g. determining whether p_input
is valid, or that buf
actually has buflen
characters of space, it's tough to do. Determining whether a pointer is valid, for instance, is not solvable in general, and any measure to detect bad pointers will be half-hearted at best. As the implementer, at times you have little choice but to clearly document your assumptions and assume the client will adhere to them, and write your code accordingly.
Testing
This function can be tested in isolation with the provided tokenize.c
program, which you do not need to modify or comment, but whose code you should read over to understand how it calls your scan_token
function. You can also write sanitycheck custom tests with tokenize
.
If you execute ./tokenize
, it will use your scan_token
function to calculate the number of syllables of various test words. You can also run it by specifying other text you would like to use to test, in this format:
./tokenize <DELIMITERS> <TEXT> <BUFSIZE (OPTIONAL)>
For example, if you would like to tokenize the text "hello I am a C-string" using the delimiters "-" and " ", you could run:
./tokenize " -" "hello I am a C-string"
The first string contains the characters to use as delimiters, and the second string is the text to tokenize. This command should output something like:
./tokenize " -" "hello I am a C-string"
Tokenized: { "hello" "I" "am" "a" "C" "string" }
remaining:
You may optionally specify a third argument which is the size of the buffer to pass when tokenizing. If you do not include this command line argument, the buffer is sized to always have enough space to store the entire token.
Before moving on: make sure you have thoroughly tested your scan_token
function, making sure to cover various different cases of possible inputs, and that you have written your custom tests. You will use this function later in the assignment, so it's vital to ensure it's thoroughly tested before moving on!
3. Documenting scan_token
When functions have assumptions, limitations or flaws, it is vital that the documentation makes those clear. Otherwise, developers don’t have the information they need to make good decisions when writing their programs. For example, one of the design flaws of strtok
is that it modifies the characters of its first argument. Luckily, this is documented in the BUGS section of the man page (though it should perhaps be emphasized more than just as a minor detail). If we were unaware of this flaw, we might assume the argument wasn't modified, breaking other parts of our program or even introducing potential vulnerabilities.
For this next part of the assignment, your task is to write a "manual page" for your scan_token
function. Function documentation like this is different than comments in your actual program code. While header, inline and other comments should be brief and standalone, a manual page reference is more thorough and cohesive. In manual pages with multiple sections, text at the beginning of a section should explain some of the concepts, and should often make some general points that apply to several functions or variables. Additionally, manual page documentation should be written more formally than code comments. As the GNU standard explains, "the only good way to use [code comments] in writing a good manual is to use them as a source of information for writing good text."
In your readme.txt
file, we have provided a template outline for your manual page. Fill in the remaining components to fully document your scan_token
function. Here is the starter template, for reference:
scan_token DOCUMENTATION
INSTRUCTIONS: Fill in the sections marked with a TODO below.
Your documentation should be original (i.e., please do not copy and paste from the assignment spec).
NAME
scan_token - # TODO write a one-sentence description of scan_token
bool scan_token(const char **p_input, const char *delimiters,
char buf[], size_t buflen);
ARGUMENTS
const char **p_input - #TODO: write one sentence explaining the p_input argument
const char *delimiters - #TODO: write one sentence explaining the delimiters argument
char buf[] - #TODO: write one sentence explaining the buf argument
size_t buflen - #TODO: write one sentence explaining the buflen argument
RETURN VALUE
#TODO: write a 1-3 sentence description of the possible return values of scan_token.
Make sure to include a description of what will be stored in the buf argument upon return.
ASSUMPTIONS
#TODO: write 2-5 sentences explaining the assumptions made by your scan_token function.
Here is an example: The scan_token function assumes that the buf argument
has space for buflen characters.
DESCRIPTION
#TODO: write one paragraph explaining the implementation of your scan_token function.
This section should include (high-level) implementation details. You can use your function-header
comment as a starting point for this section.
Tip: when you need to use scan_token
later on in your mywhich
program, try referring to just the manual page you wrote here. If you find that you need more information in order to effectively use the function, consider adding what might be missing. Your goal for your manual page reference should be that a client can effectively use your function without seeing the code, just like how you can use string functions without seeing their implementations.
4. mywhich
Your final task is to use your scan_token
and get_env_value
functions to implement the mywhich.c
program, which is a simplified version of the Unix which
command. It takes the names of executables (e.g. make
, cat
, emacs
, etc.) and prints out their filesystem locations. Read the man page for the Unix version (man which
) if you'd like, though note that your mywhich
program will differ a bit from the full which
behavior. Try out the provided sample solution, e.g. ./samples/mywhich_soln ls
or ./samples/mywhich_soln make
. For each command name, it prints the full path to the first matching executable it finds or nothing if no matching executable was found. The matched executables are listed one per line in the order that the command names were specified on the command-line. In this example, two of them were found, but no executable named submit
was found in any directory in the user's PATH and thus nothing was printed for it.
myth$ ./samples/mywhich_soln emacs submit cp
/usr/bin/emacs
/usr/bin/cp
If no command line arguments are specified, mywhich
prints out the directories in the search path, one per line.
This search is intimately related to how commands are executed by the shell. When you run a command such as ls
or emacs
, the shell searches for an executable program that matches that command name and then runs that program.
Where does it search for executables? You might imagine that it looks for an executable file named emacs
in every directory on the entire filesystem, but such an exhaustive search would be both inefficient and insecure. Instead, it looks just directly inside those directories that have been explicitly listed in the user's PATH
environment variable. The value for PATH
is a sequence of directories separated by colons such as PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/games
. When looking for a command, which
considers the directories in the order they are listed in PATH
and stops at the first one that contains a matching executable. In other words, here mywhich
would first see if /usr/local/bin/emacs
exists. If it does, mywhich
prints it out and stops. If it doesn't, it would then check for /usr/bin/emacs
. Then /bin/emacs
. And so on. There is a library function called access
that will come in handy (more on this later), which can tell you whether a given executable path is valid. Note that this process isn't doing an exhaustive search of all files directly or indirectly contained in a path like /usr/local/bin
. It's just seeing if the specified executable name, e.g. emacs
, exists directly inside the specified path location, e.g. /usr/local/bin/emacs
.
PATH
is set by default in your environment to include directories such as /usr/local/bin/
and /usr/bin/
which house the executable files for the standard Unix commands. (The name bin
is a nod to the fact that executable files are encoded in binary). For ease of testing, mywhich
also supports using the environment variable MYPATH
, if it is specified, so that you can customize the path contents without changing your PATH
environment variable (which may break other shell functionality). You can specify it with the env
command, like this:
myth$ env MYPATH=/tmp:tools ./mywhich submit
tools/submit
There are several core string tasks in this program:
- Getting the value of the
MYPATH
orPATH
environment variable - the starter code uses yourget_env_value
function to do this - Tokenizing the specified path to get each individual location you need to search - you will use your
scan_token
function to do this - creating the full path that you wish to check - e.g. taking an individual location like
/usr/local/bin
and an executable name likeemacs
and constructing a path with the concatenation of the individual location, a forward slash, and the executable name:/usr/local/bin/emacs
. Then you can pass that path as a parameter to theaccess
function to check if it's valid.
Starter Code
mywhich.c
is given to you with an incomplete main
function that handles the case when mywhich
is invoked with no arguments. You should first read and understand this code, work out how to change/extend it to suit your needs, and finally add comments to document your strategy.
Note that you can (and are encouraged to!) change code in mywhich
as you wish, to decompose it, etc. Your goal should be to have your main
function act as a concise summary of your overall program.
Some concepts to think about when looking over the code:
- When applied to an array, the
sizeof
operator conveniently returns the actual size of the array. However, as soon as that array is passed as a parameter (it becomes a pointer to the first element) or as soon as we create a pointer to any of its elements,sizeof
of that pointer will return 8 bytes instead of the array size because a pointer is 8 bytes. Additionally, note that the array size is not necessarily the same as the string length if it is a string. - If the user's environment does not contain a value for
MYPATH
, what doesmywhich
use instead? - How does a client properly use
scan_token
? (see sample uses in bothtokenize.c
andmywhich.c
) - Do you see anything unexpected or erroneous? We intend for our code to be bug-free; if you find otherwise, please let us know!
The code we provide has been stripped of its comments and it's your job to provide the missing documentation.
Implementation
The program should be implemented as follows:
- If there are no command line arguments, the program prints the directories in the search path, one directory per line. This is already implemented for you in the starter code.
- If there are command line arguments, the program searches for a matching executable for each argument in the order they were specified, and prints the full path to the first matching executable it finds or nothing if no match was found. To do this, for each argument:
- Take the specified path (the value for
MYPATH
, if it exists, or forPATH
otherwise) and tokenize it using yourscan_token
function and a buffer of sizePATH_MAX
.PATH_MAX
is the system's limit on the maximum length of a full path (including the null terminator). for each token (which is a single directory path):- use that large buffer to construct the full path: e.g. if the token is
/usr/local/bin
and the executable name isemacs
, you want to construct the path/usr/local/bin/emacs
. You may assume the constructed path will fit in thePATH_MAX
-sized buffer. - use the
access
function to check if that executable path is valid.- If it is, print out that path and move on to processing the next command line argument.
- If it's not, try searching again with the next token
- use that large buffer to construct the full path: e.g. if the token is
- Take the specified path (the value for
Note that you should not store all the path tokens in an array while tokenizing - you should perform the searches as you tokenize. For this reason, note that if there are multiple command line arguments, you will repeat the tokenization of the search path for each argument, and that's fine. You may assume that the user's MYPATH
/ PATH
variables are always well-formed sequences of one or more paths separated by colons.
Here's more information about the access
function:
access
is built-in function that is a part of the POSIX standards, which establish a set of C functions for interacting with the operating system. Whereas the standard C library functions provide only simple file reading/writing, the POSIX functions add more comprehensive services, including access to filesystem metadata (e.g. modification time, who can access files), directory contents, and filesystem operations that are necessary for implementing Unix commands like ls
and mkdir
, which are themselves just executable programs. The function access
has the following signature:
int access(const char *pathname, int mode);
It takes in a path, pathname
, and permissions, mode
, and returns whether or not you have those permissions for the file at that path. To use access
to check if an executable path is valid, we will be asking access
to check whether we can read and execute the file at that executable path. If we can, it means an executable exists at that location. Otherwise, we assume none exists there.
Therefore, when you call access
, the first parameter should be the constructed executable path you wish to check (e.g. /usr/local/bin/emacs
), and the second parameter should be a bitmask that is a combination of the bitwise constants R_OK
and X_OK
(a value with the bits in both of these constants on). In this way, we specify that we want access
to check if we have "read" and "execute" permissions for that file.
Be sure to carefully read the man page so you know how to properly interpret the return value from a call to access
!
Testing
You can write sanitycheck custom tests with mywhich
- we recommend using env
and MYPATH
to easily specify custom search paths.
Submitting
Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. When you submit, you may optionally indicate that you do not plan to make a submission after the on-time deadline. This allows the staff to start grading some submissions as soon as the on-time deadline passes, rather than waiting until after the late period to start grading.
- When in doubt, it's fine to indicate that you may make a late submission, even if you end up submitting on time
- If you do indicate you won't submit late, this means once the on-time deadline passes, you cannot submit again. You can resubmit any time before the on-time deadline, however.
- If you want to change your decision, you can do so any time before the on-time deadline by resubmitting and changing your answer.
- If you know that you will not make a late submission, we would appreciate you indicating this so that we can grade assignments more quickly!
You only need to modify the following files for this assignment: util.c
, mywhich.c
, custom_tests
, readme.txt
We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!
Grading
Below is the tentative grading rubric. We use a combination of automated tests and manual review to evaluate your submission. More details are given in our page linked to from the Assignments dropdown explaining how assignments are graded.
Readme (12 points)
Functionality (82 points)
- Sanity cases (25 points) Correct results on the default sanity check tests.
- Comprehensive/stress cases (40 points) Correct results for additional test cases with broad, comprehensive coverage and larger, more complex inputs.
- Clean compile (2 points) Compiles cleanly with no warnings.
- Clean run under valgrind (10 points) Clean memory report(s) when run under valgrind. Memory errors (invalid read/write, etc) are significant deductions. Every normal execution path is expected to run cleanly with no memory errors nor leaks reported. We will not test exception/error cases under Valgrind.
- custom_tests (5 points) Your custom_tests file should include at least 10 tests of your own, 3-5 per program, that show thoughtful effort to develop comprehensive testing coverage. Please include comments that explain your choices. We will run your custom tests against your submission as well as review the cases to assess the strength of your testing efforts.
Code Quality (buckets weighted to contribute ~15 points)
The grader's code review is scored into a bucket per assignment part to emphasize the qualitative features of the review over the quantitative. The styleguide is a great overall resource for good program style. Here are some highlights for this assignment:
- Using library functions where possible. If the C library provides functionality needed for a task, you should leverage these library functions rather than re-implement that functionality.
- Use of pointers and memory. We expect you to show proficiency in handling pointers/memory, no unnecessary levels of indirection, correct use of pointee types and typecasts, and so on. For this program, you should not need and should not use dynamic memory (i.e. no
malloc
/free
/strdup
). - Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. When you need the same code in more than one place, you should unify, not copy and paste.
- Style and readability. We expect your code to be clean and readable. We will look for descriptive names, defined constants (not magic numbers!), and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
- Documentation. You are to document both the code you wrote and what we provided (except for
tokenize.c
andmyprintenv.c
). We expect program overview and per-function comments that explain the overall design along with sparing use of inline comments to draw attention to noteworthy details or shed light on a dense or obscure passage. The audience for the comments is your C-savvy peer.
Post-Assignment Check-in
How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have written your own implementation of a standard Unix utility program and an improved version of a standard library function, along with comprehensive documentation. That's a pretty darn impressive accomplishment, especially so given only a few weeks of learning about Unix and C -- wow!
To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.
- The string library contains several functions to perform a form of string comparison, e.g.
strncmp
,strstr
,strchr
,strspn
, ... Explain the differences between the functions and identify a situation in which each is appropriate. - Write a C expression that converts a hexadecimal digit char to its numerical value, i.e. '1' => 1, 'f' => 15.
- The first parameter to the function
scan_token
is of typeconst char **
. Explain the purpose of the extra level of indirection on that argument. - It is controversial (see section 13) whether to add
.
(the current directory) to your PATH. Why might it be convenient? Why does it introduce a security risk? - Why is good function documentation (like manual pages) critical for good software development?