Assignment 2: C Strings

Due: Wed Apr 21 11:59 pm
Late submissions accepted until Fri Apr 23 11:59 pm

Assignment by Julie Zelenski, with modifications by Nick Troccoli, Katie Creel and Brynne Hurst

Learning Goals

This assignment covers topics in recent string lectures and the second lab. You will be building your skills with:

  • C-strings (both raw manipulation and using string library functions)
  • viewing Unix utility programs from an internal perspective - as an implementer, not just a client
  • exposure to programmatic access of the filesystem and shell environment variables
  • thoroughly documenting your code, and learning about the importance of good documentation

Overview

For this assignment, you will write programs that replicate some of the functionality of the Unix commands printenv and which. This is an especially appropriate way to learn more about C and Unix; implementing the Unix operating system and its command-line tools were what motivated the creation of the C language in the first place! Implementing these programs is a very natural use of C, and you'll see how comfortably it fits in this role. Moreover, when we interact with the filesystem programmatically in C, as we will do on part of this assignment, we can use a C string to represent a path (like /a/b/c) and can construct and dissect paths by string manipulation and string.h library functions. Working with paths is thus practice with C strings!

Take a look at the working on assignments page linked to from the assignments dropdown as you are working on this assignment; it outlines everything you need to know about working through a CS107 assignment, from getting the starter code to testing to submitting.

Note: To remind you of the collaboration policy as it applies to the assignments: you are to do your own independent thinking, design, coding, and debugging. The assignments are for you to solidify your own understanding and develop your individual skills. If you are having trouble completing the assignment on your own, please reach out to the course staff; we are here to help!

To get started on the assignment, clone the starter project using the command

git clone /afs/ir/class/cs107/repos/assign2/$USER assign2

The starter project contains the following:

  • readme.txt: a text file where you will answer questions for the assignment
  • util.c mywhich.c and Makefile: two files that you will modify, and their Makefile for compiling
  • custom_tests: the file where you will add custom tests for your programs
  • myprintenv.c: a program that calls your get_env_value function in util.c for testing purposes.
  • tokenize.c: a program that calls your scan_token function in util.c for testing purposes.
  • samples: a symbolic link to the shared directory for this assignment. It contains:
    • SANITY.ini and prototypes.h: files to configure and run Sanity Check. You can ignore these.
    • myprintenv_soln, mywhich_soln and tokenize_soln: executable solutions for the programs you will write.
  • tools: contains symbolic links to the sanitycheck and submit programs for testing and submitting your work. It also contains a new codecheck tool for checking for style and other common code isuses.

Assignment Support: Through TA helper hours and the discussion forum, our focus will be on supporting you so that you can track down your own bugs. Please ask us how to best use tools (like the brand-new GDB!), what strategies to consider, and advice about how to improve your debugging process or track down your bug. We're happy to help you with these in order to help you drive your own debugging. For this reason, if you have debugging questions during helper hours, please make sure to gather information and explore the issue on your own first, and fill out the QueueStatus questions with this information. For instance, if your program is failing in certain cases, try to find a small replicable test case where the program fails; then, try using GDB to narrow down to what function/code block/etc. you think is causing the issue; then, further investigate just that area to try and find the bug. As another example, if your program is crashing, take a similar approach, but try using GDB and the backtrace command (check it out in lab!) to narrow in on the source of the crash. This information is important for the course staff to effectively help you with debugging. Starting with a future assignment, we will require this information when signing up for helper hours for debugging help, so please make sure to provide as much information as possible.

Using The Strings Library

For this assignment, use of getenv/secure_getenv and strtok/strsep is prohibited since you are writing your own versions of those functions, but the rest of the standard library is at your disposal and its use is strongly preferred over re-implementing its functionality. The functions in the standard library are already written, tested, debugged, and highly optimized. What's not to like? One important consideration, though, is to choose the appropriate function to use. As one example, there are several different functions that do variants of string compare/search (strstr, strchr, strncmp, strspn and so on). While working on this assignment, be sure to choose the approach that most directly accomplishes the task at hand.

Background: Unix Filesystem and the Shell Environment

In this assignment, you will write code that interacts with the Unix filesystem and something called shell environment variables. If you need an introduction or refresher on the filesystem, review our Unix guide for tutorials on the tree structure, absolute and relative paths, and various commands to interact with files and directories as a user.

We made a video explaining some of the background information about Unix and the terminal that may be helpful for this assignment. We recommend watching it before continuing.

As mentioned in the video above, on a Unix system, programs run in the context of the user's "environment". The environment is a list of key-value pairs that provide information about the terminal session and configure the way processes behave. You have already used the USER environment variable when cloning your assignment repo; USER is set to your SUNet ID when you log into myth. Other variables include PATH (where the system looks for programs to run), HOME (path to your home directory), and SHELL (your command line interpreter).

Explore your environment by trying out the printenv and env commands mentioned in the video, and reading their manual pages. You will be implementing your own version of the printenv program as part of the assignment. As a summary:

  • printenv will show your environment variables. Run printenv with no arguments to see your entire environment. Then try printenv USER SHELL HOME. What is the output from a bad request like printenv BOGUS?
  • env is a command that allows you to temporarily change environment variables. You can execute something like:
env BINKY=1 OTHERARG=2 ./myprogram

and myprogram will be executed in a temporary environment with all of the original environment variables, plus BINKY set to 1 and OTHERARG set to 2. To see this, run printenv, then run env BINKY=1 WINKY=2 printenv. What changes between the two?

Note: custom_tests do not support using env - if you want to test with env, you can test with it manually outside of custom tests. You can use env with GDB for debugging; start gdb prefixed with env, and then run as normal - for instance: env USER=otheruser gdb myprintenv

1. get_env_value

Special Note: _ is a special environment variable that you do not have to worry about being different in the output for myprintenv. In other words, you should be able to correctly find the value if _ is passed in to your get_env_value function, but the output of ./myprintenv _ may differ from the sample solution as well as ./myprintenv (so you cannot use it in custom tests).

Your first task will be to implement a function get_env_value that can be written and tested in isolation, and will be used later in the assignment. This function should be implemented in util.c. It is used in the provided myprintenv.c program. You do not need to modify or comment myprintenv.c.

The get_env_value function takes as parameters an array of environment variables, and the name of a specific variable of interest. It should return the value of the variable name of interest in the array of environment variables, or NULL if it was not found in the array.

The required prototype for get_env_value is:

const char *get_env_value(const char *envp[], const char *key);

Each entry in the envp array is a string of the form "KEY=VALUE" (you can assume that neither KEY nor VALUE will contain an =). A NULL pointer is placed after the last entry to mark the end of the entries in the environment array. Your function should iterate through the entries in the environment variables array and look for a matching entry.

For example, if the envp contains an entry "USER=troccoli", the associated value for "USER" is "troccoli". Your function should not make a copy of the value string, it should just return a pointer to the portion of the string following the '=' char.

You can use our provided myprintenv.c program for testing the function. If you execute ./myprintenv, it will print out all the environment variables, just like printenv; this functionality does not rely on your get_env_value function. If you specify one or more command line arguments, it will print out the value of each of those in the environment; this functionality relies on your get_env_value function. It is also integrated with sanitycheck for your convenience. Be sure to make use of the custom tests option to extend your testing beyond the default cases.

Restrictions: get_env_value should not call the getenv/secure_getenv functions, if you are familiar with them. To operate correctly, it must manually search the environment that was passed via the envp argument.

Testing: try using the env program to temporarily change environment variable values and ensure they print out correctly. For instance, if you execute

env USER=otheruser ./myprintenv USER

this will change the USER environment variable just for executing myprintenv this one time to be otheruser instead of your SUNET ID. You can then ensure that myprintenv prints out the correct value, otheruser.

2. scan_token

Note: this problem requires the use of double pointers, which are covered in lecture 6. Please make sure to review the lecture 6 material before beginning this problem.

Make sure you have also understood the strtok function as mentioned in the last lab. Understanding that will help significantly as you implement this function!

Your second task will be to implement a function scan_token that can be written and tested in isolation, and will be used later in the assignment. This function should be implemented in util.c. It is used in the provided tokenize.c program. You do not need to modify or comment tokenize.c.

We studied strtok in lab as a code study exercise. Such a function to tokenize a string by delimiters is handy to have, but the standard strtok has design flaws that make it difficult to use. The intention of scan_token is to separate a string into tokens in the manner of strtok but with a cleaner design. The required prototype for scan_token is:

bool scan_token(const char **p_input, const char *delimiters,
                char buf[], size_t buflen);

The function scans the input string to determine the extent of the next token, using the delimiters as separators, and then writes the token characters to buf, making sure to terminate with a null char. The function returns true if a token was written to buf, and false otherwise.

Here are specific details of the function's operation:

  • Your implementation of scan_token should take the same general approach as strtok, meaning it can (and should!) use the handy <string.h> functions such as strspn and strcspn, but it should not replicate the bad parts of its design. Specifically, it should not:
    • use static variables
    • modify the input string
    • have carryover of state between calls. Each call should operate independently.
  • The function separates the input into tokens in the same way that strtok does: it scans the input string to find the first character not contained in delimiters. This is the beginning of the token. It scans from there to find the first character contained in delimiters. This delimiter (or the end of the string if no delimiter was found) marks the end of the token.
  • buf is a fixed-length array to store the token and buflen is the length of the buffer. scan_token should not write past the end of buf. If a token does not fit in buf, the function should write buflen - 1 characters into buf, write a null terminator in the last slot, and the pointer held by p_input should be updated to point to the next character following the buflen - 1 characters in the token. In other words, the next token scanned will start at the first character that would have overflowed buf.
  • Note that the parameter p_input is a char **. This is a pointer to a string, which is necessary because the scan_token function should update this string to point to the next character in the input that follows what was just scanned. If the scanned token consumed all of the remaining input, *p_input should point to the input's null terminator.

Consider this sample tokenization loop:

const char *input = "super-duper-awesome-magnificent";
char buf[10];
const char *remaining = input;

while (scan_token(&remaining, "-", buf, sizeof(buf))) {
    printf("Next token: %s\n", buf);
}

Running the above code produces this output:

Next token: super
Next token: duper
Next token: awesome
Next token: magnifice
Next token: nt

You can use our provided tokenize.c program for testing the function. If you execute ./tokenize, it will use your scan_token function to calculate the number of syllables of various test words, as a test of your function. You can also run it by specifying other text you would like to use to test, in this format:

./tokenize <DELIMITERS> <TEXT> <BUFSIZE (OPTIONAL)>

For example, if you would like to tokenize the text "hello I am a C-string" using the delimiters "-" and " ", you could run:

./tokenize " -" "hello I am a C-string"

The first string contains the characters to use as delimiters, and the second string is the text to tokenize. This command should output something like:

./tokenize " -" "hello I am a C-string"
Tokenized: { "hello" "I" "am" "a" "C" "string" }

You may optionally specify a third argument which is the size of the buffer to pass when tokenizing. If you do not include this command line argument, the buffer is sized to always have enough space to store the entire token.

tokenize.c is integrated with sanitycheck for your convenience. Be sure to make use of the custom tests option to extend your testing beyond the default cases.

Assumptions: We will only test the function on valid arguments. More specifically this means:

  • buf is a valid address to a region of memory that has space for buflen characters
  • buflen > 1
  • p_input is a valid pointer to a pointer
  • *p_input is a well-formed (e.g. null-terminated) C-string. (may be empty string)
  • delimiters is a well-formed C-string containing one or more delimiter chars. (will not be empty string)

Note that even if you wish to add checking for some of these assumptions, e.g. determining whether p_input is valid, or that buf actually has buflen characters of space, it's tough to do. Determining whether a pointer is valid, for instance, is not solvable in general, and any measure to detect bad pointers will be half-hearted at best. As the implementer, at times you have little choice but to clearly document your assumptions and assume the client will adhere to them, and write your code accordingly.

3. Documenting scan_token

When functions have assumptions, limitations or flaws, it is vital that the documentation makes those clear. Otherwise, developers don’t have the information they need to make good decisions when writing their programs. For example, one of the design flaws of strtok is that it modifies its first argument. Luckily, this is documented in the BUGS section of the man page (though it should perhaps be emphasized more than just as a minor detail). If we were unaware of this flaw, we might assume the argument wasn't modified, breaking other parts of our program or even introducing potential vulnerabilities.

For this next part of the assignment, your task is to write a "manual page" for your scan_token function. Function documentation like this is different than comments in your actual program code. While header, inline and other comments should be brief and standalone, a manual page reference is more thorough and cohesive. In manual pages with multiple sections, text at the beginning of a section should explain some of the concepts, and should often make some general points that apply to several functions or variables. Additionally, manual page documentation should be written more formally than code comments. As the GNU standard explains, "the only good way to use [code comments] in writing a good manual is to use them as a source of information for writing good text."

In your readme.txt file, we have provided a template outline for your manual page. Fill in the remaining components to fully document your scan_token function. Here is the starter template, for reference:

scan_token DOCUMENTATION
INSTRUCTIONS: Fill in the sections marked with a TODO below.
Your documentation should be original (i.e., please do not copy and paste from the assignment spec).
NAME
    scan_token - # TODO write a one-sentence description of scan_token
    bool scan_token(const char **p_input, const char *delimiters,
                    char buf[], size_t buflen);
ARGUMENTS
    const char **p_input - #TODO: write one sentence explaining the p_input argument
    const char *delimiters - #TODO: write one sentence explaining the delimiters argument
    char buf[] - #TODO: write one sentence explaining the buf argument
    size_t buflen - #TODO: write one sentence explaining the buflen argument
RETURN VALUE
    #TODO: write a 1-3 sentence description of the possible return values of scan_token. 
    Make sure to include a description of what will be stored in the buf argument upon return.
ASSUMPTIONS
    #TODO: write 2-5 sentences explaining the assumptions made by your scan_token function.

    Here is an example: The scan_token function assumes that the buf argument
    has space for buflen characters.
DESCRIPTION
    #TODO: write one paragraph explaining the implementation of your scan_token function.
    This section should include (high-level) implementation details. You can use your function-header
    comment as a starting point for this section.

Tip: when you need to use scan_token later on in your mywhich program, try referring to just the manual page you wrote here. If you find that you need more information in order to effectively use the function, consider adding what might be missing. Your goal for your manual page reference should be that a client can effectively use your function without seeing the code, just like how you can use string functions without seeing their implementations.

4. mywhich

With your utility functions tested and debugged, and your scan_token function documented, you're now ready to build a larger program that uses them: your own version of the which command, a utility used to locate and identify executable programs to run.

The file mywhich.c is given to you with an incomplete main function that sketches the expected behavior for the case when mywhich is invoked with no arguments. You should first read and understand this code, work out how to change/extend it to suit your needs, and finally add comments to document your strategy.

Some concepts to think about when looking over the code:

  • What is PATH_MAX? What is it used for?
  • When applied to an array, the sizeof operator conveniently returns the actual size of the array. However, as soon as that array is passed as a parameter (it becomes a pointer to the first element) or as soon as we create a pointer to any of its elements, sizeof of that pointer will return 8 bytes instead of the array size because a pointer is 8 bytes. Additionally, note that the array size is not necessarily the same as the string length if it is a string.
  • If the user's environment does not contain a value for MYPATH, what does mywhich use instead?
  • How does a client properly use scan_token? (see sample uses in both tokenize.c and mywhich.c)
  • Do you see anything unexpected or erroneous? We intend for our code to be bug-free; if you find otherwise, please let us know!

The code we provide has been stripped of its comments and it's your job to provide the missing documentation.

Note: you can (and are encouraged to!) change code in mywhich as you wish, to decompose it, etc. Your goal should be to have your main function act as a concise summary of your overall program.

What does the which command do?

As mentioned in the assignment overview video, the which command searches for a command by name and reports the full path to its matching executable file. Read its man page (man which) and try it out, e.g. which ls or which make or which emacs. The response from which is the matching executable, or no output if not found.

This search is intimately related to how commands are executed by the shell. When you run a command such as ls or emacs, the shell searches for an executable program that matches that command name and then runs that program.

Where does it search for executables? You might imagine that it looks for an executable file named emacs in every directory on the entire filesystem, but such an exhaustive search would be both inefficient and insecure. Instead, it looks only directly inside those directories that have been explicitly listed in the user's PATH environment variable. The default search path includes directories such as /usr/local/bin/ and /usr/bin/ which house the executable files for the standard Unix commands. (The name bin is a nod to the fact that executable files are encoded in binary).

The user can configure their search path by changing the value of their PATH environment variable. The value for PATH is a sequence of directories separated by colons such as PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/games. When looking for a command, which considers the directories in the order they are listed in PATH and stops at the first one that contains a matching executable. In order to match, the file's name must be an exact match and the file must be readable and executable by the user.

How does mywhich operate?

The mywhich program you are to write is similar in operation to the standard which with a few differences. Here's how it works:

  • mywhich uses the environment variable MYPATH for the search path. If no such environment variable exists, it falls back to PATH. (Standard which always uses PATH as the search path)
  • mywhich does not support any command-line flags. (standard which -a prints all matches)
  • mywhich invoked with no arguments prints the directories in the search path, one directory per line. (standard which with no arguments does nothing). This use case is a testing aid to verify that your utility functions work correctly. This is an example of what this could look like, though it may vary when you run it:
myth$ ./mywhich 
Directories in search path:
/usr/bin
/usr/local/bin
/usr/pubsw/bin
/bin
  • mywhich invoked with one or more arguments searches for a matching executable for each argument. The sample output below shows invoking mywhich to find three executables. For each command name, it prints the full path to the first matching executable or nothing if no matching executable was found. The matched executables are listed one per line in the order that the command names were specified on the command-line. In this example, two of them were found, but no executable named submit was found in any directory in the user's PATH and thus nothing was printed for it.
myth$ ./mywhich emacs submit cp
/usr/bin/emacs
/usr/bin/cp

A command will contain only non-special characters (i.e. no ^ # * / $ % which have special meaning in Unix).

You will want to test with different configurations of the search path. Rather than permanently change your actual PATH (which can prevent normal operations from working), we recommend that you change MYPATH, which only affects mywhich and nothing else. Use the env command mentioned in the earlier section and the assignment video to set the value of MYPATH just when running mywhich, like this:

myth$ env MYPATH=/tmp:tools ./mywhich submit
tools/submit

We also encourage using sanity check and the sample solution. For example, for any command ./mywhich arguments, try the same invocation on the solution samples/mywhich_soln arguments and validate that your program has the same behavior as the solution.

Assumptions. You may assume correct usage in all cases and that the user's MYPATH and PATH variables are well-formed sequences of one or more paths separated by colons. You do not need to detect or handle situations where these assumptions do not hold and we will not test on any inputs that violate these assumptions, e.g. no usage of the unsupported -a flag, no malformed values for MYPATH, and no special characters in commands.

Operation. The user's MYPATH (or PATH if there is no MYPATH variable in the user's environment) defines the search path. mywhich considers the directories in the order they are listed in the search path. If the directory contains a readable, executable file whose name is an exact match to the command, the search stops here and prints the full path of that file.

How can I test if a file is an executable?

You must use the library function access() (man access). access is built-in function that is a part of the POSIX standards, which establish a set of C functions for interacting with the operating system. Whereas the standard C library functions provide only simple file reading/writing, the POSIX functions add more comprehensive services, including access to filesystem metadata (e.g. modification time, who can access files), directory contents, and filesystem operations that are necessary for implementing Unix commands like ls and mkdir, which are themselves just executable programs. The function access checks the user's permissions (how they can access a file) for a file.

Given a full path and a "mode", the function reports whether you can access that path for that mode. The mode to check for is a combination of R_OK and X_OK. This verifies that you have permission to read the file and the file is executable. Be sure to carefully read the man page so you know how to properly interpret the return value from a call to access.

The "mode" is provided as a bitmask - in particular, if you want both R_OK and X_OK, you must provide a mask with both of those bits on.

One other minor detail is that a path that is a directory may test as readable and executable, and thus appear to be an executable file to mywhich. There is further filesystem information you can use (man stat) to distinguish directories from files, but we don't ask you to do this. mywhich should just use access to filter results to those matching entries that are readable/executable, without concern for if those matches are files or directories. This is the behavior exhibited by the sample solution.

Testing

In addition to the above programs, as part of this assignment you should also add at least 5 additional tests of your own in the custom_tests file that show thoughtful effort to develop comprehensive test coverage. When you add a test, also document your work by including comments in the file that explain why you included each test and how the tests relate to one another. The tests supplied with the default SanityCheck are a start that you should build on, with the goal of finding and fixing any bugs before submitting, just like how a professional developer is responsible for vetting any code through comprehensive testing before adding it to a team repository. We recommend you run SanityCheck early and often (and remember, it even makes a snapshot of your code to guard against editing mishaps!). You can also find suggested testing strategies on the testing page, linked to from the Assignments dropdown.

Code Check

We have added a new tool, codecheck, that can check your code for common style and other issues that are detectable at compile-time. You can run it like the sanitycheck and submit tools, from the tools folder, by running tools/codecheck, which runs it on all .c files in the current directory. You can also run it on individual files, like this:

tools/codecheck file1.c file2.c

This check is NOT exhaustive; it is only meant to identify common style and other code issues. It is also something new for CS107; we are working to tune codecheck to examine issues in a manner consistent with the assignment specs and style guide. Check the course style guide and assignment spec for more code guidelines. If you find any checks that you believe are inconsistent with course guidance, or otherwise overly restrictive, please let us know. We hope you find this helpful!

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.

When you submit, you may optionally indicate that you do not plan to make a submission after the on-time deadline. This allows the staff to start grading some submissions as soon as the on-time deadline passes, rather than waiting until after the late period to start grading.

  • When in doubt, it's fine to indicate that you may make a late submission, even if you end up submitting on time
  • If you do indicate you won't submit late, this means once the on-time deadline passes, you cannot submit again. You can resubmit any time before the on-time deadline, however.
  • If you want to change your decision, you can do so any time before the on-time deadline by resubmitting and changing your answer.
  • If you know that you will not make a late submission, we would appreciate you indicating this so that we can grade assignments more quickly!

You only need to modify the following files for this assignment: util.c, mywhich.c, custom_tests, readme.txt

We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!

Grading

Below is the tentative grading rubric. We use a combination of automated tests and manual review to evaluate your submission. More details are given in our page linked to from the Assignments dropdown explaining how assignments are graded.

Readme (10 points)

Functionality (80 points)

  • Sanity cases (25 points) Correct results on the default sanity check tests.
  • Comprehensive/stress cases (40 points) Correct results for additional test cases with broad, comprehensive coverage and larger, more complex inputs.
  • Clean compile (2 points) Compiles cleanly with no warnings.
  • Clean run under valgrind (8 points) Clean memory report(s) when run under valgrind. Memory errors (invalid read/write, etc) are significant deductions. Every normal execution path is expected to run cleanly with no memory errors nor leaks reported. We will not test exception/error cases under Valgrind.
  • custom_tests (5 points) Your custom_tests file should include at least 5 tests of your own that show thoughtful effort to develop comprehensive testing coverage. Please include comments that explain your choices. We will run your custom tests against your submission as well as review the cases to assess the strength of your testing efforts.

Code Quality (buckets weighted to contribute ~15 points)

The grader's code review is scored into a bucket per assignment part to emphasize the qualitative features of the review over the quantitative. The styleguide is a great overall resource for good program style. Here are some highlights for this assignment:

  • Using library functions where possible. If the C library provides functionality needed for a task, you should leverage these library functions rather than re-implement that functionality.
  • Use of pointers and memory. We expect you to show proficiency in handling pointers/memory, no unnecessary levels of indirection, correct use of pointee types and typecasts, and so on. For this program, you should not need and should not use dynamic memory (i.e. no malloc/free/strdup).
  • Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. When you need the same code in more than one place, you should unify, not copy and paste.
  • Style and readability. We expect your code to be clean and readable. We will look for descriptive names, defined constants (not magic numbers!), and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
  • Documentation. You are to document both the code you wrote and what we provided. We expect program overview and per-function comments that explain the overall design along with sparing use of inline comments to draw attention to noteworthy details or shed light on a dense or obscure passage. The audience for the comments is your C-savvy peer.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have written your own implementation of a standard Unix utility program and an improved version of a standard library function, along with comprehensive documentation. That's a pretty darn impressive accomplishment, especially so given only two weeks of learning about Unix and C -- wow!

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.

  • The string library contains several functions to perform a form of string comparison, e.g. strncmp, strstr, strchr, strspn, ... Explain the differences between the functions and identify a situation in which each is appropriate.
  • Write a C expression that converts a hexadecimal digit char to its numerical value, i.e. '1' => 1, 'f' => 15. Be sure to consider string.h functions that can help with the job.
  • The first parameter to the function scan_token is of type const char **. Explain the purpose of the extra level of indirection on that argument.
  • It is controversial (see section 13) whether to add . (the current directory) to your PATH. Why might it be convenient? Why does it introduce a security risk?
  • Why is good function documentation (like manual pages) critical for good software development?

Frequently Asked Questions

We will add common questions and answers here as students work on the assignment. Check back for updates!

Can I use strtok, getenv or strsep on this assignment?

No. For this assignment, you are writing scan_token as the better replacement for strtok. Your mywhich program should call scan_token, not strtok or strsep. Similarly, you are writing your own version of getenv.

Should I separate out all the directories from searchpath and store them into an array of strings?

No. Attempting to first tokenize the searchpath into directories and only then process the directories adds the complication of allocating/managing/deallocating an intermediate memory structure for the array and strings, which we don't want you to do. The preferred approach is "tokenize-and-test". Searching for a command should tokenize a directory from the searchpath and test for the presence of a matching executable in that directory. If it's not found, tokenize the next directory and test, and so on until all directories in the searchpath have been examined. In this way, the memory needs are simplified to just what is needed to process a single directory at a time. If you are searching for multiple commands, you will repeat the tokenization of the search path for each command and that's fine.

How can I assemble a full path from a directory and command name?

Declare a stack buffer of a large size into which you will construct the full path. The appropriate value for "large size" is the constant PATH_MAX defined in the included file <limits.h>. PATH_MAX is the system's limit on the maximum length of a full path (including the null terminator). Now you want to fill the buffer with the concatenation of the directory, a forward slash, and command name.

What does the const in const char * and const char ** mean?

A const char * means that the characters pointed to by this pointer cannot be changed. It also means that if you create another pointer to point to these same characters, it must also be const; think of const like part of the variable type itself. You can, however, reassign the const char * to point to something else; it is just that you are not able to change the characters at the location to which it points. A const char ** is similar; it means that the characters pointed to by the pointer pointed to by the char ** cannot be changed. You can modify the char * pointed to by the char **, however, or the char ** itself to point to something different. In other words, in both cases const char * and const char ** mean the characters at the location ultimately being referred to cannot be modified.

I'm getting compiler warnings about initialization discards 'const' qualifier from pointer target type. How do I resolve these?

The const qualifier on a declaration is an indication that type is only to be read, not written. A const char * and a char * are not quite the same type. You can read characters from either, but only the non-const version allows the characters to be changed. While it is perfectly fine to supply a non-const pointer where a const is expected (no cast required or recommended), the inverse will raise a warning from the compiler. Throwing down a typecast to const would quiet the compiler, but that's not the appropriate fix. If write-access is required, a read-only pointer is not going to work and a cast only serves to cover up the mistake. You should instead be fixing the logic to supply the correct pointer. If write-access is not needed, then the simple and correct choice is to add the const qualifier to the declaration.