Written by Julie Zelenski
The assignment writeup is the place to go for the formalities (required specifications, due date, grading standards, logistics, and so on). This is the companion advice page where we offer hints and recommendations and answer common question that we expect may come up along the way. You may want to skim this page now to get a sense of what's covered and return here on an as-needed basis for further review when/if these issues arise. We will also update this page in real-time with any late-breaking issues that come up while students are working on the assignment.
Advice for assignment 1
Assignment 1 consists of two utility programs: one to write from scratch and another that you modify/extend from our given starter code. Neither program requires creating a large amount of new code, but it can take some time to get acclimated to the terse, cryptic nature of C and work through the inevitable bugs that result from C's ubiquitous use of pointers, arrays, and strings. We cannot over-emphasize how important it is to dedicate yourself to habits that allow you to do your best work. All that your CS106 instructors taught you about the value of decomposition, incremental development, readability, and early-and-often testing is even more true in the world of C. The woodworker's mantra "measure twice, cut once" is the right attitude to have here, and will dramatically reduce your total time spent on assignments.
Our first recommendation is that you check out our best practices for completing cs107 assignments for broadly applicable advice for tackling any assignment!
Use sanitycheck early and often!
Be sure you ready to make use of our sanity check tool. This handy tool automates comparison of your program's output to the sample. The default sanity check runs a few simple cases that are great for verifying basic functionality. You can also provide your own custom cases to run more comprehensive tests. Custom sanity check is your friend -- take some time now to get comfortable using it and you'll enjoy a payoff on this and all subsequent assignments.
Advice/questions specific to mywhich
How can I parse out the individual directories from the user's PATH?
Given the search path (either from PATH environment variable or passed as command-line argument), you need to divide it into directories at the position of each colon :. The C library provides the tokenizing functions strtok and strsep which sound like the right idea, but sadly, these functions have an awkward and confusing interface which makes them more trouble than they are worth. Instead approach this as a substring task. (1) Given the start and end positions of a substring, how might you extract just that portion out of string? (2) How can you get the start and end position of the substring that represents each directory within PATH? (Hint: strchr) Combine (1) and (2) and you're good to go!
Should I use strtok or strsep to separate the directories from PATH?
These are strongly discouraged, but not expressly prohibited. (See above answer with a different approach to parsing the colon-delimited path.)
Should I separate out all the directories from PATH and store into an array of strings?
No. Attempting to first parse PATH into directories and only then process the directories adds the complication of allocating/managing/deallocating an intermediate memory structure for the array and strings, which we don't want you to do. The preferred approach as given in the writeup is "extract-and-test". Searching for a command should extract a single directory from PATH and test for the presence of a matching executable in that directory. If not found, extract the next directory from PATH and test, and so on until PATH is exhausted. The memory needs are simplified to just what is needed to process a single directory at a time.
How can I assemble a full path from a directory and command name?
Declare a stack buffer of a large size into which you will construct the full path. The appropriate value for "large size" is the constant PATH_MAX defined in the include file <limits.h>. PATH_MAX is the system's limit on the maximum length of a full path. Now you want to fill the buffer with the concatenation of the directory, a forward slash, and command name. My preferred technique for this is to use one combined sprintf to write to the buffer, but you could also do it using strcpy/strcat or assign single characters using array notation.
How can I test if a file exists and is executable?
The C libraries include a function access() that will do the job nicely (man access for more detail). The mode you want to pass is X_OK, which will check if the file both exists and is executable.
Advice/questions specific to mygrep
Invalid/malformed regular expression patterns that you may assume won't happen
Below are the four reasons that a pattern can be malformed. We will not test on malformed patterns and you do not have to detect or handle them.
*metacharacter not preceded by appropriate character (e.g.,'*'or'^*abc')^metacharacter in a position other than the first (e.g.,'abc^abc')$metacharacter in a position other than the last (e.g.,'abc$abc')- pattern can match empty zero-length string (e.g.,
'a*'or'^'or'$'or'a*.*b*')
The fourth group of patterns that match the empty string are technically not malformed, but dealing with an empty string match requires some awkward special-case handling that we would rather you avoid, so we disallow them along with the other malformed patterns. Those patterns that match the empty string might match some non-empty strings as well, it is the potential to match an empty string (and the hassle in handling that case) that makes us disallow them.
Our solution has explicit handling to reject invalid patterns. You do not need to mimic that, we added that simply as a testing aid for you. If you're not sure if a pattern is valid, run our sample solution on it to see if it is rejected.
Does mygrep have an escape character?
No, there is no escape character supported in our version of these regular expressions, so it is not possible to match a string that you want to literally be, for example, $5.50 or char *.
When invoking mygrep on a pattern containing non-letters (spaces, punctuation, metacharacters), the shell's response is baffling to me and my program is not run. What can I do?
Enclose any pattern containing non-letters in single-quotes. This will suppress the special handling that would otherwise apply to those characters that have specific meaning to the shell ($ * ; & > to name just a few). By enclosing the pattern in single-quotes, you are telling the shell to pass the argument along to mygrep exactly as given.
I'm not sure if a particular use of mygrep is valid or whether the matches my program finds are the correct ones. How can I verify my program's behavior is correct?
Compare to our solution! For any command ./mygrep pattern file, try the same invocation on the solution slink/mygrep_soln pattern file and validate that your program has the same behavior as the solution. You can also use custom sanitycheck (sanitycheck instructions) to automate comparisons between your program and the solution.
If both commands get the same strange response from the shell (e.g. "unbound variable", "illegal variable name", ...), it most likely indicates the pattern needs to be enclosed in single-quotes because of special characters. (see previous question) Also note that the solution includes special handling to reject illegal patterns. If you're unsure whether a given pattern is valid and should be included in your testing, try it on the solution. If the solution reports the pattern is illegal, you'll know you can ignore it. We will not test on illegal patterns and your program is not expect to detect or handle them.
(As an implementor) What is stdin and how does it differ from an opened file?
The <stdio.h> header declares stdin for the already-opened FILE* connected to the terminal. Code that reads from an ordinary FILE* is equally capable of reading from stdin without changes -- you will not need to make a special case for either. The only constraint to be aware of is that you can only move forward through stdin (i.e must start at beginning and read to end once with no ability to jump ahead or rewind).
(As a client) What is stdin and how do I use it for testing?
If no filename argument is given to mygrep, it reads from stdin. Instead of having to create and save a file with the text you want to use an input, you can type the input at the terminal in real-time. This is super-convenient for quick testing! Try running mygrep with only the pattern argument:
myth> ./mygrep apattern
Because it was invoked without a filename argument,mygrep is now waiting for input to be entered at the terminal. Type a line and hit return; mygrep will read it, attempt to match the pattern within that line, and will either echo the line with match(es) highlighted or no echo if no match. Enter additional lines to further test matching the pattern; use Control-D to signal the end of input. Very handy!
Frequently asked questions about assign1 in general
What is the difference between commit and submit? Do I need to do both?
Both are essential. hg commit takes a current snapshot of your work and saves it to your history. You will make many commits during development, but this history is stored only in your local repo. As a final step, submit copies your complete work into the class repo and makes it available to the staff for grading. If you only commit, but never submit, we never see your work and will score the empty repo as a zero. Be sure you both commit AND submit!
How do I use the sample executable? How does it relate to sanity check?
Our provided sample executable can be used a reference implementation during testing. Run the solution and your program on the same input and verify the output is the same:
myth> slink/mygrep_soln joy slink/dictionary
... solution output here
myth> ./mygrep joy slink/dictionary`.
... your output here
If your program produces the same result as the sample, all is good. A manual comparison becomes tedious as you have more/larger tests, so we wrote the sanity check tool to help. It automates the process of running a command once on the sample and again on your program, compares the two results, and reports match/mismatch. Read our guide on how to use sanity check.
How do I measure the runtime of a program?
Prefix a command with time to execute and report on the time used to complete it. For example, time ./mygrep joy slink/dictionary will measure your program's runtime searching the dictionary for joy, time slink/mygrep_soln joy slink/dictionary will measure the sample executable's runtime on the same task. The time labeled "user" is the one of interest.
How do I measure the memory use of a program?
Run the program under Valgrind. Look for the line labeled "total heap usage" to see the count of allocations and the number of bytes allocated. You can run also the solution under Valgrind to spy on its memory usage.
How can I determine if my program meets the runtime and memory efficiency benchmarks?
Measure the sample executable on a given input and compare to the measurement of your program running on the same input. If your performance is in the same ballpark as ours (say within a factor of 2 or 3), you're golden. When considering tradeoffs of space versus time, the sample executable shows you the balance we are looking for. If your program uses noticeably more memory or time, it indicate you have some unnecessary or redundant work that should be eliminated.
What is the proper way to handle a fatal error?
Print an informative error message and halt the program by calling exit(1). The non-standard convenience function error() combines print with exit if you'd rather (read its man page for info). We recommend that you report/handle the error directly at the point of detection. Attempting to pass out codes and information to be used further back up the chain is awkward and messier.
Do I need malloc/free for this assignment?
No. You do not need to use malloc/free, and we ask that you not use malloc/free on this assignment. (We won't cover these in lecture until after assign1 goes out anyway.)
I'm getting compiler warnings about initialization discards 'const' qualifier from pointer target type. How do I resolve these?
The const qualifier on a declaration is an indication that type is only to be read, not written. A const char * and a char * are not quite the same type. You can read characters from either, but only the non-const version allows the characters to be changed. While it is perfectly fine to supply a non-const pointer where a const is expected (no cast required or recommended), the inverse will raise a warning from the compiler. Throwing down a typecast to const would quiet the compiler, but that's not the appropriate fix. If write-access is required, a read-only pointer is not going to work and a cast only serves to cover up the mistake. You should instead be fixing the logic to supply the correct pointer. If write-access is not needed, then the simple and correct choice is to add the const qualifier to the declaration.
Which #include files will I need for this assignment?
That can vary from student to student, but here are the ones that we used in our solutions:
#include <stdbool.h> // for being able to use type bool in the code
#include <error.h> // for error()
#include <limits.h> // for PATH_MAX
#include <stdio.h> // for printf()
#include <string.h> // for strchr() strcpy() etc, all the str functions
#include <unistd.h> // for access()
Note that for future assignments you'll be expected to deduce your own list of needed includes. If you know the function name, you can always look at the man page to get the information about how to use the function and also the name of its associated include file.