Tweeter Project Introduction
Lecture Notes for CS 190
Spring 2015
John Ousterhout
What Tweeter Does
- Twitter-like Web service: Tweeter
- Users identified with 64-bit unique integers
- Friendships:
- If you are my friend, then I am your follower
- Status updates (tweets)
- Associated with the user who creates them
- Visible to followers of that user
- Each tweet has a 64-bit unique identifier
- Tweet ids indicate order: higher id => later tweet
- Applications access Tweeter using HTTP and JSON:
- HTTP protocol for requests and responses
- Requests described with URLs, query values, form data
- Structured data returned in JSON format
- Web services interface only: no HTML output
- Each request described with a URL that specifies an operation
and parameters, such as:
http://localhost:8080/friendships/create?my_id=100&user_id=200
Fields of URL:
- Scheme (http:): identifies protocol used to fetch the content.
- http: is the most common scheme and is the only one used
in this class.
- Host name (//host.company.com or //localhost): name of
the machine running the desired server.
- Server's port number (8080): allows multiple servers to run
on the same machine. Normal Web servers usually run on port 80 (the default).
- Hierarchical portion (/friendships/create): identifies a
particular request, such as create a new friendship.
- Query info (?my_id=100&user_id=200): provides parameters for
the request
- URL encoding:
- If a query value contains any character other than A-Z, a-z,
0-9, or any of -_.~ it must be represented as %xx,
where xx is the hexadecimal value of the character.
- " " becomes %20
- "&" becomes %26, etc.
- Example:
/statuses/update?status=Stuck%20in%20traffic
- When extracting information from URLs, you must reverse
the escaping.
- Requests are sent to servers using HTTP: HyperText Transfer Protocol
- Simple request-response protocol, sent using TCP/IP sockets.
- Sample request (see slide):
- First line contains method, URL (hierarchical portion and query), version number
- GET method: read information from server. Should have no side
effects.
- POST method: uploads data from the browser to the server (typically
form data), returns information from the server. Likely to have
side effects.
- There are several other methods defined besides these two, but
they are not needed for this project.
- Headers: name-value pairs providing various information that may be
useful to the server.
- A request can also contain data following the headers
- GET method doesn't have any data
- POST method may include additional parameters in body,
in the same format as in URLs (my_id=100&user_id=200)
- Sample response (see slide):
- First line contains protocol version number, numerical status code,
textual explanation.
- Headers have same general format as for requests
- Blank line separates headers from response data.
- Response body will be in JSON format for this class.
- Tweeter responses are in JSON format (JavaScript Object Notation):
- Commonly used to transmit structured data in Web applications
- String format that describes a JavaScript literal value:
- Simple values: strings, numbers
- Arrays
- Objects (collections of named values)
- See http://json.org/ for details
- URLs/operations (see slide)
- Users are identified by 64-bit integers
- Each tweet has a 64-bit unique identifier that you must
assign, in order
- Create and delete follower relationships:
- /friendships/create
- /friendships/destroy
- Query follower relationships
- /friends/ids.json
- /followers/ids.json
- Create tweets:
- Retrieve recent tweets:
- /statuses/user_timeline.json
- /statuses/home_timeline.json
Implementing Tweeter
- Most important goal: clean, simple, obvious code
- Must be well commented
- Write in Java (suggest using Eclipse)
- Teams of two
- Must build your own mechanisms for implementing the HTTP protocol
and for generating JSON.
- Do not use existing Java libraries
- Can use basic Java classes such as Socket,
InputStreamReader, and OutputStreamReader.
- If in doubt about what existing classes you may use, ask me.
- Durability
- Must store user info and tweets in files
- Must recover information from files if the server crashes and
restarts
- See project writeup for more details.
- Tweets are never deleted.
- Consider a few performance issues in your design:
- Overall, application must have "reasonable" performance.
- Must handle millions of users
- Must handle millions of tweets per day.
- Users may have large numbers of followers (millions),
but unlikely for users to have more than a few hundred friends.
- Important to cache frequently used information in memory:
- Assume 100-200GB of DRAM
- Can keep all user info in memory all the time
- Can keep millions of tweets in memory (but not all tweets
for all time)
- Entire application runs on a single server.
- Escaping: special characters must be encoded, such as "&" or "=" in
query values or "\" or in JSON strings.
- HTTP requests (query values, form data): URL encoding (already
described).
- JSON strings: you must escape special characters (backslash,
double-quote, control characters such as newline
and tab, any character with a value less than 32, and
the character with value 0x3f)
- E.g., replace "\" with "\\"
- E.g., replace character code 1 with "\x01"