Assignment 3: Stanford Shell

This handout was adapted from Jerry Cain’s Spring 2018 offering.

Kudos to Randal Bryant and Dave O’Hallaron of Carnegie Mellon for assignment inspiration and for parts of this handout. Huge thanks to Truman Cranor for writing the command line parser using tools you’ll learn in CS143, which you should all take someday because it’s an amazing class.

You’ve all been using shells to drive your conversations with UNIX systems since the first day you logged into a myth. It’s high time we uncover the shell’s magic by leveraging the basic-shell we built in lecture together and extending it to support process control, job lists, signals, pipelines, and I/O redirection – all while managing the interprocess concurrency problems that make a shell’s implementation a genuinely advanced systems programming project. There’s lots of neat code to write, and with your smarts and my love to guide you, I’m confident you can pull it off.

Due Date: Saturday, July 20th, 2019 at 11:59 p.m. (if you submit by Friday night, you'll get an on-time bonus)

Related lecture material: all the multiprocessing lectures

All coding should be done on a myth cluster machine, as that’s where we’ll be testing all assign3 submissions. You should clone the master git repository we’ve set up for you by typing:

git clone /usr/class/cs110/repos/assign3/$USER assign3
      

Doing so will create an assign3 directory within your AFS space, and you can descend into that assign3 directory and code there. There’s a working sanitycheck for this assignment, and your repo includes a soft link to a fully functional solution. When in doubt about how something should work, just run my solution (which can be found at ./samples/stsh_soln) to see what it does and imitate the solution.

Opening notes

I personally think this is the hardest assignment of the quarter (others might disagree), but it is also my favorite assignment of the quarter. This is how real shells work – you can clone bash’s source code or zsh’s source code and see very similar patterns to what you’re writing in this assignment.

The starter code is a functional shell, which you are asked to extend. It’s worth compiling the starter code and playing with that shell to see what it can currently do. It’s also worth running the sample solution and spending some time with that to see what you’re asked to build. As you’re working on this project, test very often to ensure you haven’t broken anything!

It won’t take you nearly as long to figure out the starter code as it did with Assignment 1, but you should still expect to spend some time figuring out pipeline and command and friends.

You should know your way around your own shell before building this one. You should be familiar with the following things:

Builtin stsh Commands

Your Assignment 3 shell needs to support a collection of builtin commands that should execute without creating any new processes. You might want to run the sample solution and play around to see how to interact with these builtins.

The required builtins are:

quit, exit, and jobs are already implemented for you. You’re responsible for implementing the others and ensuring the global job list is appropriately updated.

Getting Started

Inspect the stsh.cc file we give you. This is the only file you should need to modify. The core main function you’re provided looks like this:

int main(int argc, char *argv[]) {
                       pid_t stshpid = getpid();
                       installSignalHandlers();
                       rlinit(argc, argv); // configures stsh-readline library so readline works properly
                           while (true) {
                       string line;
                       if (!readline(line)) break;
                       if (line.empty()) continue;
                       try {
                       pipeline p(line);
                       bool builtin = handleBuiltin(p);
                       if (!builtin) createJob(p); // createJob is initially defined as a wrapper around cout << p;
                               } catch (const STSHException& e) {
                       cerr << e.what() << endl;
                       if (getpid() != stshpid) exit(0); // if exception is thrown from child process, kill it
                               }
                    }

                    return 0;
                 }
              

The general idiom here is the same used in the basic-shell example from lecture, except that this version is in C++, and it’ll support many more features. The readline function prompts the user to enter a command and a pipeline record is constructed around it. readline and pipeline (which is a different from the pipeline function you implemented for Assignment 2) are implemented via a suite of files in the stsh-parser subdirectory, and for the most part you can ignore those implementations. You should, however, be familiar with the type definitions of the command and pipeline types, though, and they are right here:

const size_t kMaxCommandLength = 32;
               const size_t kMaxArguments = 32;
               struct command {
               char command[kMaxCommandLength + 1]; // '\0' terminated
               char *tokens[kMaxArguments + 1]; // NULL-terminated array, C strings are all '\0' terminated
            };

            struct pipeline {
            std::string input;   // empty if no input redirection file to first command
            std::string output;  // empty if no output redirection file from last command
            std::vector<command> commands;
            bool background;

            pipeline(const std::string& str);
            ~pipeline();
         };
      

Check out what the initial version of stsh is capable of before you add any new code.

Milestones

The best approach to implementing anything this complex is to invent a collection of milestones that advance you toward your final goal. Never introduce more than a few lines of code before compiling and confirming that the lines you added do what you expect. I repeat: Never introduce more than a few lines of code before compiling, testing, and confirming that the additional lines do what you expect. View everything you add as a slight perturbation to a working system that slowly evolves into the final product. Try to understand every single line you add, why it’s needed, and why it belongs where you put it.

Here is a sequence of milestones I’d like you to work through in order to get started:

  1. Descend into the stsh-parser directory, read through the stsh-readline.h and stsh-parse.h header files for data type definitions and function/method prototypes, type make, and play with the stsh-parse-test to gain a sense of what readline and the pipeline constructor do for you. In general, the readline function is like getline, except that you can use your up and down arrows to scroll through your history of inputs (neat!). The pipeline record defines a bunch of fields that store all of the various commands that chain together to form a pipeline. For example, the text cat < /usr/include/stdio.h | wc > output.txt would be split into two commands – one for the cat and a second for the wc – and populate the vector<command> in the pipeline with information about each of them. The input and output fields would each be nonempty, and the background field would be false.
  2. Get a pipeline of just one command (e.g. sleep 5) to run in the foreground until it’s finished. Rely on a call to waitpid to stall stsh until the foreground job finishes. Ignore the job list, ignore the SIGCHLD handler, don’t worry about background jobs, pipelining, or redirection. Don’t worry about programs like emacs just yet. Focus on these executables instead: ls, date, sleep, as their execution is well understood and predictable.
  3. Establish the process group ID of the job to be the PID of the process by investigating the setpgid system call. When you run stsh from the standard Unix shell, note that stsh is running in the foreground. If your shell then creates a child process, by default that child will also be a member of the foreground process group as well, and you don’t want that. Since typing ctrl-c sends a SIGINT to every process in the foreground group, typing ctrl-c will send a SIGINT to your shell, as well as to every process that your shell created, and you don’t want that. The notion of a group ID isn’t all that important yet, because at the moment, a process group consists of only one process. But we’ll eventually be dealing with jobs comprised of many processes in a pipeline, and we’ll want a single process group id to represent all of them.
  4. Read through stsh-job-list.h, stsh-job.h, and stsh-process.h to learn how to add a new foreground job to the job list, and how to add a process to that job. Add code that does exactly that to the stsh.cc file, right after you successfully fork off a new process. After your waitpid call returns, remove the job from the job list. If it helps, inline cout << joblist; lines in strategically chosen locations to confirm your new job is being added after fork and being removed after waitpid.
  5. Implement the SIGCHLD handler to reap the resources of a foreground process after it exits, and suspend stsh’s main thread of execution using a sigset_t, sigprocmask, sigsuspend, and related functions until an examination of the job list proves that the foreground job is no longer the foreground job. Your call to waitpid should be moved into the SIGCHLD handler, and that should be the only place in your entire solution – even the final one you submit for the 100% I know you have in you – with a waitpid call.
  6. Install functions to activate ctrl-z and ctrl-c on your keyboard to stop and kill the foreground process instead of the shell itself. If, for instance, a sleep 500 is running as the foreground, you may want to kill the process by pressing ctrl-c. When you press ctrl-c, the OS sends a SIGINT to your shell, which unless handled will terminate your shell. If, however, you install a function to handle SIGINT, then that handler can (and should) forward the SIGINT on to the foreground job, should one exist. The story for ctrl-z is similar, except ctrl-z prompts the OS to send your shell a SIGTSTP. If you install a custom handler to intercept a SIGTSTP, you can forward the SIGTSTP on to the foreground job, should one exist. In these cases, the kill function is your friend and will contribute to your implementation of both of these handlers.
  7. Implement the fg builtin, which takes a stopped process – stopped presumably because it was running in the foreground at the time you pressed ctrl-z – and prompts it to continue, or it takes a process running in the background and brings it into the foreground. The fg builtin takes job number, translates that job number to a process group ID, and, if necessary, forwards a SIGCONT on to the process group via a call to kill(-groupID, SIGCONT). Right now, process groups consist of just one process, but once you start to support pipelines (e.g. cat words.txt | sort | uniq | wc -l), you’ll want fg to bring the entire job into the foreground, and if all relevant processes are part of the same process group, you can achieve this with a single kill call. Of course, if the argument passed to fg isn’t a number, or it is but it doesn’t identify a real job, then you should throw an STSHException that’s wrapped around a clear error message saying so.
  8. Update the SIGCHLD handler to detect state changes in all processes under stsh’s jurisdiction. Processes can exit gracefully, exit abnormally because of some signal, or be terminated by ctrl-c. Processes can halt because of a signal or a ctrl-z. And processes can continue because of a signal or an fg builtin. The job list needs to remain in sync with the state of your shell’s world, and waitpid is the perfect function to tell you about changes. You’re already familiar with WNOHANG, WUNTRACED, WIFEXITED, WEXITSTATUS, etc. Read through waitpid’s man page to get word on some Linux-specific flags and macros that tell you about processes that have started executing again. Buzzwords include WCONTINUED and WIFCONTINUED, so read up on those.
  9. Add support for background jobs. The pipeline constructor already searches for trailing &’s and records whether or not the pipeline should be run in the background. If it does, then you still add information about the job to the job list, but you immediately come back to the shell prompt without waiting.
  10. Add support for slay, halt, cont (the process-oriented commands) and bg (which prompts all processes in a single job to continue in the background), and use some of the sample user programs I include in your repo (int, fpe, tstp, spin, split) to test all of these.

The following are additional milestones you need to hit on your way to a fully functional stsh. Each of these bullet points represents something larger.

Shell Driver

Note: You don’t have to understand this section or how to use the shell driver. However, it will be useful if you want to write your own tests. (That’s useful for quickly testing the shell each time you make a change, alerting you if you’ve broken anything unexpected.) The sanitycheck tests are not thorough.

The stsh-driver program (there’s a copy of it in your repo) executes stsh as a child process, sends it commands and signals as directed by a trace file, and allows the shell to print to standard output and error as it normally would. The stsh process is driven by the stsh-driver, which is why we call stsh-driver a driver.

Go ahead and type ./stsh-driver -h to learn how to use it:

$ ./stsh-driver -h
      Usage: ./stsh-driver [-hv] -t <trace> -s <shell> [-a <args>]
      Options:
      -h         Print this message
      -v         Output more information
      -t <trace> Trace file
      -s <shell> Version of stsh to test
      -a <args>  Arguments to pass through to stsh implementation
      

We’ve also provided several trace files that you can feed to the driver to test your stsh. If you look drill into your repo’s samples symlink, you’ll arrive at /usr/class/cs110/samples/assign3, which includes not only a copy of my own stsh solution, but also a directory of shared trace files called scripts. Within scripts , you’ll see simple, intermediate, and advanced subdirectories, each of which contains one or more trace files you can use for testing.

Run the shell driver on your own shell using trace file bg-spin.txt by typing this:

./stsh-driver -t ./samples/scripts/simple/bg-spin.txt -s ./stsh -a "--suppress-prompt --no-history"
      

(the -a "--suppress-prompt --no-history" argument tells stsh to not emit a prompt or to use the fancy readline history stuff, since it confuses sanitycheck and autograder scripts.)

Similarly, to compare your results with those generated by my own solution, you can run the driver on ./stsh_soln shell by typing:

./stsh-driver -t ./samples/scripts/simple/bg-spin.txt -s ./samples/stsh_soln -a "--suppress-prompt --no-history"`
      

The neat thing about the trace files is that they generate the same output you would have gotten had you run your shell interactively (except for an initial comment identifying the output as something generated via stsh-driver). For example:

$ ./stsh-driver -t ./samples/scripts/advanced/simple-pipeline-1.txt -s ./samples/stsh_soln -a "--suppress-prompt --no-history"
      # Trace: simple-pipeline-1
      # ------------------------
      # Exercises support for pipes via a foreground pipeline with
      # just two processes.
      stsh> /bin/echo abc | ./conduit --count 3
      aaabbbcccdddeeefffggghhhiiijjj
      

The process ID’s listed as part of a trace’s output will be different from run to run, but otherwise your output should be exactly the same as that generated by my solution.

Tips and Tidbits

Submitting your work

Once you’re done, you should test all of your work as you normally would and then run the infamous submissions script by typing ./tools/submit.

Grading

Your assignments will be rigorously tested using the tests we expose via sanitycheck plus a whole set of others. I reserve the right to add tests and change point values if during grading I find some features aren’t being exercised, but I’m fairly certain the breakdown presented below will be a very good approximation regardless.

Basic Tests (47 points)

Intermediate Tests (50 points)

Advanced Tests (70 points)

Extra credit

If you submitted the Week 3 survey by Sunday, July 21, 11:59pm, we will give you one additional point towards your functionality score. You'll need to use your Stanford email address to access the form.