Topic 4 Networking (2)

Leedehai
Wednesday, May 24, 2017

4.5 A taste of HTTP: emulating wget

HTTP: Hypertext Transfer Protocol, an application layer protocol widely used in today's web.

Ha, I just used this command tool, wget, a week ago.

4.5.1 Description

wget is a command line utility that, given a Universal Resource Locator (URL), can downloads files (.html, .js, .jpg, .mov, .zip, or whatever). For example, we want to download a picture of a guru in computer science, Professor Donald Knuth at Stanford.

$ wget https://en.wikipedia.org/wiki/File:KnuthAtOpenContentAlliance.jpg -O DKnuth.jpg --2017-05-28 08:07:15-- https://en.wikipedia.org/wiki/File:KnuthAtOpenContentAlliance.jpg Resolving en.wikipedia.org... 198.35.26.96, 2620:0:863:ed1a::1 Connecting to en.wikipedia.org|198.35.26.96|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘DKnuth.jpg’ DKnuth.jpg [ <=> ] 44.27K --.-KB/s in 0.01s 2017-05-28 08:07:16 (3.71 MB/s) - ‘DKnuth.jpg’ saved [45337] $

Our implementation, web-get, simplifies wget's functionality, but nonetheless serves a good example to illustrate most basic parts of the HTTP protocol, since this tool need to speak HTTP, specifically HTTP/1.0 dialect, to fulfill its mission.

$ web-get https://en.wikipedia.org/wiki/File:KnuthAtOpenContentAlliance.jpg Total number of bytes fetched: 45337 $

SIDE NOTE:
Review from EE284 or CS144: Internet protocol suite in TCP/IP 4-layer model
Application layer (with typical port number) - BGP(179), DHCP(67,68), DNS(53), FTP(21), HTTP(80), HTTPS(443), IMAP(143), NTP(123), POP3(110), SMTP(25), SSH(22), Telnet(23), ...
Transport layer - TCP, UDP, DCCP, SCTP, RSVP, ...
Internet layer - IP (IPv4, IPv6), ICMP, ECN, IGMP, OSPF, ...
Link layer - ARP, NDP, PPP, MAC (Ethernet, DSL, ISDN, FDDI), ...

4.5.2 The code (KOB)

First and foremost, we can see the high-level structure from the main() and the pullContent().

#include ... #include "socket++.h" using namespace std; int main(int argc, char *argv[]) { pullContent(parseURL(argv[1])); return 0; } /* helper: skip "http://" and split the rest into "host" and "path" */ pair<string, string> parseURL(string url) { if (startsWith(url, kProtocolPrefix)) url = url.substr(kProtocolPrefix.size()); size_t found = url.find('/'); if (found == string::npos) return make_pair(url, kDefaultPath); /* defined in <utility> */ string host = url.substr(0, found); string path = url.substr(found); return make_pair(host, path); } /* helper: get the filename, i.e. the last component of the path */ string getFileName(const string& path) { if (path.empty() || path[path.size() - 1] == '/') { return "index.html"; /* not always correct, but not the point */ } size_t found = path.rfind('/'); return path.substr(found + 1); } /* Do the real work */ void pullContent(const pair<string, string>& parsedURL) { const string &host = parsedURL.first; /* www.google.com */ const string &path = parsedURL.second;/* images/branding/(...).png */ /* The six steps of downloading */ /* step 1: connect to the server (port 80 for HTTP), get an SD */ int clientSocket = createClientSocket(host, 80); /* step 2: layer a buffer over the "bare" SD */ sockbuf sb(clientSocket); /* step 3: create a stream class around the buffer */ iosockstream ss(&sb); /* step 4: send request via the socket stream */ issueRequest(ss, host, path); /* step 5: skip the header of the response (placed in the stream) */ skipHeader(ss); /* step 6: save the payload (placed in the stream) */ savePayload(ss, getFileName(path)); }

Here are the functions responsible for those steps.

/* send request via the socket stream */ void issueRequest(iosockstream& ss, const string& host, const string& path) { ss << "GET " << path << " HTTP/1.0\r\n"; ss << "Host: " << host << "\r\n"; ss << "\r\n"; /* indicating "the end of my HTTP request" */ ss.flush(); /* ensure everthing is truely pushed through the web */ } /* It is not much difference from what we manually type into the terminal * when requesting a connection to a server, as shown in 4.1.2. * Note that HTTP requires "\r\n" to denote a line-breaking, not "\n" */
/* skip the header of the response, which is written * by the server to the socket stream */ void skipHeader(iosockstream& ss) { string line; do { getline(ss, line); } while (!line.empty() && line != "\r"); }
const size_t kBufferSize = 1024; /* save the paload, e.g. texts, images, videos.. */ void savePayload(iosockstream& ss, const string& filename) { ofstream output(filename, ios::binary); /* it might not be text */ /* conutinuously read and write */ size_t totalBytes = 0; while (!ss.fail()) { char buffer[kBufferSize] = {'\0'}; ss.read(buffer, sizeof(buffer)); /* read from stream to buffer */ totalBytes += ss.gcount(); output.write(buffer, ss.gcount()); /* write from buffer to file */ } cout << "Total number of bytes fetched: " << totalBytes << endl; }

Note that getline() and read() will block if there is nothing to read.

4.5.3 Some tidbits

4.6 Relevant system calls

4.6.1 Need-to-know: IP addresses, domain names

SIDE NOTE:
You can examine and edit your machine's network interface configuration with command ifconfig.
You can send ICMP ECHO_REQUEST packets to a host designated by you with command ping.

Refer to EE284 or CS144 for more details.

4.6.2 Hostname resolution

4.6.3 The sockaddr hierarchy: IP address/port number

/* all in big-endian */ struct sockaddr_in { /* IPv4 socket addr./port record */ unsigned short sin_family; /* protocol for socket: AF_INET, 2 bytes */ unsigned short sin_port; /* port number, 2 bytes */ struct in_addr sin_addr; /* IP address, 4 bytes */ unsigned char sin_zero[8]; /* pad this structure to 16 bytes long */ }; struct sockaddr_in6 { /* IPv6 socket addr./port record */ unsigned short sin6_family; /* protocol for socket: AF_INET6 */ unsigned short sin6_port; /* port number */ /* ... */ }; struct sockaddr { /* generic socket addr./port record */ unsigned short sa_family; /* AF_INET or AF_INET6 */ char sa_data[14]; /* other data, padding this structure to 16 bytes long */ };

4.6.4 Implementing createClientSocket()

You don't need to write this on your own, but since this is not a system call, we want you know it.

#include ... using namespace std; int createClientSocket(const string &host, unsigned short port) { /* step 1: get the IP address of the intended host */ struct hostent *he = gethostbyname(host.c_str()); if (he == NULL) return -1; /* e.g. the host machine is down */ /* step 2: get an unused SD */ int s = socket(AF_INET, SOCK_STREAM, 0); if (s < 0) return -1; /* step 3: create an (IP, port) pair structure, and fill it with 0's */ struct sockaddr_in server; memset(&server, 0, sizeof(server)); /* step 4: populate the (IP, port) pair structure */ server.sin_family = AF_INET; /* make unsignshed short big-endian, per networking standard */ server.sin_port = htons(port); server.sin_addr = (in_addr *)((sin_addr *)he->h_addr_list[0]); /* step 5: connect the intended (IP, port) */ if (connect(s, (sockaddr *)&server, sizeof(server)) == 0) return s; close(s); /* if connect() fails, close s as an FD */ return -1; }

4.6.5 Implementing createServerSocket()

Listen to a specific port on any of its own IP address.

#include ... using namespace std; static const int kReuseAddresses = 1; /* 1 means true here */ static const int kDefaultBacklog = 128; /* allow 128 clients to queue */ int createServerSocket(unsigned short port) { /* step 1: get an unused SD */ int s = socket(AF_INET, SOCK_STREAM, 0); if (s < 0) return -1; /* setsockopt() used here so port becomes available even if * server crashes and reboots */ if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &kReuseAddresses, sizeof(int)) < 0) { close(s); return -1; } /* step 2: create an (IP, port) pair structure, and fill it with 0's */ struct sockaddr_in server; memset(&server, 0, sizeof(server)); /* step 3: populate the (IP, port) pair structure */ server.sin_family = AF_INET; /* make unsignshed short big-endian, per networking standard */ server.sin_port = htons(port); /* make unsignshed long big-endian, per networking standard */ /* we are creating a listening SD, so it should wait for * requests from any IP address */ server.sin_addr.s_addr = htonl(INADDR_ANY); /* step 4: bind the (IP, port) pair with the socket, * and make it a listenning socket */ if (bind(s, (sockaddr *) &server, sizeof(server)) == 0 && listen(s, kDefaultBacklog) == 0) return s; close(s); return kServerSocketFailure; }

4.6.6 Summary: steps of creating sockets

4.6.6.1 For client: create a client socket

step 1: get the IP address of the intended server: gethostbyname();

step 2: get an unused SD: socket();

step 3: create an (IP, port) pair structure of type sockaddr_in, and fill it with 0's;

step 4: populate the (IP, port) pair structure's fields: .sin_family, .sin_port, .sin_addr;

Note that .sin_port, .sin_addr should be the port number and IP address of the other party, i.e. the server, not of the client itself.
(remember to make sure these two fields big-endian).

step 5: connect the intended (IP, port) through the socket: connect(). If failed, remember to close the socket.

4.6.6.2 For server (1): create a listening socket

step 1: get an unused SD: socket();

step 2: create an (IP, port) pair structure of type sockaddr_in, and fill it with 0's;

step 3: populate the (IP, port) pair structure's fields: .sin_family, .sin_port, .sin_addr;

Note that .sin_port, .sin_addr should be the port number and IP address of the other party, i.e. the client, not of the server itself. If there is no designated client, pass INADDR_ANY as the IP address.
(remember to make sure these two fields big-endian).

step 4: bind the (IP, port) pair with the socket: bind(), and make it a listening socket: listen(). If failed, remember to close the socket.

By default, an SD created by socket() is not intended for listening (a "passive" role). If you want to make it a listening socket, you need to call listen() to convert it.

Note that listen() immediately returns; accept() is blocking.

4.6.6.3 For server (2): create a connected socket

Call accept() on the server's (only) listening SD. accept() blocks the calling thread until a connection request from a client is received, at which point a connected SD will be automatically created and returned by accept(). The connected SD should be closed once the client is done.

Note that listen() immediately returns; accept() is blocking.

a client process (IP, port) a server process (IP, port) ┌───────────────────────┐ ┌───────────────────────┐ │ │ []listening SD │ │ │ │ │ │ │ another client ◀═▶[]connected SD ◀─▶ R/W │ │ │ : : │ │ R/W ◀─▶ client SD[]◀══════socket═══════▶[]connected SD ◀─▶ R/W │ └───────────────────────┘ └───────────────────────┘ A succesfully connected client-server link
EOF