Assignment 7: The HTTP File Server

Due: Tue Aug 29 5:59 pm
No late submissions accepted.

Assignment by Daniel Rebelsky

Late submissions are NOT ALLOWED.

Learning Goals

This assignment focuses on sockets programming and on building up your ability to create programs from scratch---while we'll guide you through the process of creating the server, we will provide much less starter code than in prior assignments.

Overview

In this assignment, you will implement a simple HTTP file server. There are four parts to the assignment.

  • implementing a TCP file server that responds to simple requests
  • modifying the server to run over HTTP
  • modifying that server to support concurrent and invalid requests
  • making a simple client

To get started on the assignment, clone the starter project using the command

git clone /afs/ir/class/cs107/repos/assign7/$USER assign7

The starter project contains the following:

  • Makefile: the Makefile for compiling
  • stage1.c, stage2.c, stage3.c, stage4.c, get.c
  • util.c and util.h utility functions and constants for HTML/HTTP and other things (no need to read the c file, although you're welcome to, all documentation is in the header file)
  • string.c and string.h contain utility functions and a struct for a simple growable string: you don't have to use it, but it may be useful in the HTTP portion when you're trying to dynamically generate HTML and figure out its length
  • custom_tests: the file where you should add custom tests for your programs
  • samples: a symbolic link to the shared directory for this assignment. It contains:
    • samples/check_stagen for checking your stagen implementation
  • tools: contains symbolic links to the submit and codecheck programs for checking and submitting your work.

Useful information/Changes from previous assignments

This assignment has gone through many fewer revisions than previous assignments, so we expect it to be more challenging. Additionally, since there is more freedom in your solutions, we have a slightly different testing framework. We provide tests in samples/test*---to use a test, just run samples/test_stage1 from your assignment directory. Also, you can ignore the Killed message: as long as you see "Looks good!", your passing the test cases we've given you (note that this takes a decent amount of time for the later parts of the assignment and won't work if you happen to run on the same machine as someone else in 107). You can also use python3 -m pdb samples/test_stagen to get a gdb-like interface to step through the Python code and see where tests aren't working (feel free to ask on Ed about this), and you can copy and edit the test cases if you want (e.g., I found it helpful to occasionally remove the stdout=DEVNULL to allow printf style debugging). For debugging on this assignment, nc will also be invaluable---be sure to read the man page or refer back to lecture slides, and when you get to the HTTP portions, remember about the -C flag. In general, where there is lack of specificity in this assignment, we've tried to add a test case to cover the expected behavior. Also, the stages are intended to build on each other, so it would be a reasonable approach to copy your code from one to the next.

Error handling

As mentioned in lecture, you should handle errors as they are potentially encountered: we reserve the right to test that your code appropriately handles error conditions. In general, what we recommend for this assignment is using perror("some message here"); followed by exit(1) for server-level errors, and just removing the client for errors local to an indiviual client.

1. Implementing a TCP file server that responds to simple requests

Your first goal should be to write a program (in stage1.c) that listens on a given port (specified as the first argument). At this point, samples/check_stage1 should pass. It doesn't need to do anything besides accept a single request.

Our next step will be to write an echo server (in stage2.c): continue looping forever, and in each loop, accept a connection from the client, read a single line, and write that line back before closing the connection. At this point, samples/check_stage2 should pass.

In the next step, we want to build out the listing functionality (in stage3.c). First, modify the server to take in two parameters: the first should be the path to serve and the second should be the port number. Instead of echoing back the line, read the path it specifies, and if it's a directory, write the entries to the stream. Otherwise, if it's a regular file just output the file contents. Return 404 if the path refers to a non-existent file/directory. Make sure to only allow reading of paths under the path specified, you can use canonicalize to help with this. At this point, samples/check_stage3 should pass.

2. Modifying the server to run over HTTP

HTTP Overview

Feel free to read over the Wikipedia page for HTTP, but for the purposes of this assignment, all we need to know is that the client request will start with a line like

GET /pathname/file%20with%20space HTTP/1.1

that ends with \r\n. We have three parts separated by spaces, the verb (which you can ignore for this assignment), the path which is URL encoded, and the protocol version (which you can ignore---you don't need to check it's value). Our response will look something like the following. Note that every line should end with \r\n, and note that the numbering is just for easy reference below (you should not actually write it out!).

1. HTTP/1.1 200 OK
2. Content-Type: text/html
3. Connection: Close
4. Content-Length: 11
5. 
6. Hello world

In broad strokes, there are two parts to this: our headers and the content. The above are a pretty minimal set of headers, but enough that the content should render correctly. In the first line, we set the response code---if it was a successful request, you can write HTTP/1.1 200 OK, otherwise, we'll want to give some error code. For the purposes of this assignment, you can always just write out HTTP/1.1 404 Not Found (feel free to just use the provided NOT_FOUND constant). In the second line, we set the MIME type of the response. For the directory listings, you can use text/html and for the files, you can use the result of get_mime_type. Line 3 tells the client that we will close the connection after writing the output (we will not listen for more requests on the same connection). Line 4 specifies the size of our output in bytes. Line 5 is a required empty line to separate our content, and line 6 starts our 11 bytes of content.

HTML Overview

Our minimal valid HTML file listing looks roughly like

<!doctype HTML>
<html>
    <body>
        <h1>/</h1>
        <ul>
            <li><a href="/..">..</a></li>
            <li><a href="/.">.</a></li>
            <li><a href="/Makefile">Makefile</a></li>
        </ul>
    </body>
</html>

You don't need to worry about indentation, but you should make sure to include an h1 with the current path, and a li and a for each link.

At the end of this milestone, when you're running the server and on the Stanford WiFi, you should be able to go to mythnn.stanford.edu:portn where nn is the number of your myth machine and portn is the port number you're using. Note that if you're not on the Stanford WiFi, you can use the Stanford VPN to still connect to the Myth machine. (Currently, there are no tests for this part of the assignment.)

Modifying the server to support concurrent and invalid requests

Note that if one client is taking too long right now, all other requests will be blocked until we finish serving it. We want to change to using a select or poll based state-machine approach. (Currently, there are no tests for this part of the assignment.)

Write a simple HTTP GETter

Write a simple client that takes in a URL and PATH (command line arguments) and sends a GET request to that address. Currently, there are no tests for this part of the assignment, but you should be able to compare your output to that of a nc -C or look at just the response vs what curl returns.