In this project you and a teammate will implement a Twitter-like social networking service that allows users to send and read short messages called tweets. The service will be accessed over the Web using the HTTP protocol, but it will not return HTML Web pages for display in a browser. Instead, Tweeter returns data in JSON format (JavaScript Object Notation); this structured form is intended for use by programs such as mobile applications.
Each Tweeter user is identified with a 64-bit unique integer. You do not need to worry about how these identifiers are assigned; you can assume that each user knows his or her identifier as well as the identifiers of anyone they wish to follow. Tweeter does not store user information such as name, address, etc.
Tweeter supports friend/follower relationships among users. Any user A can declare that any other user B is their friend. This means that A would like to read any tweets generated by B. If B is A's friend, then A is said to be a follower of B. Each user may follow any number of other users; friendships may be created and deleted at any time.
A tweet is a short message (140 characters or less) generated by a user. Each tweet is identified by a 64-bit unique integer, and ids must be assigned in increasing order: if one tweet is created after another, then it must have a larger id than the other.
Tweeter allows tweets to be read in two ways. First, it provides a mechanism for reading recent tweets generated by a particular user. Second, it provides a mechanism for reading recent tweets generated either by a user or by any of that user's friends. This second mechanism is the one most typically used by interactive applications.
Applications interact with Tweeter over the network using the HTTP protocol. See the class lecture notes for basic information on HTTP. Each HTTP request specifies an operation for Tweeter to perform, such as "make user 100 a follower of user 200". This information is encoded using the URL for the request. For example, if Tweeter is listening on port 8080 of the local machine, the preceding request can be invoked with an HTTP PUT request that specifies the following URL:
http://localhost:8080/friendships/create?my_id=100&user_id=200
The hierarchical portion of the URL (/friendships/create
)
specifies an operation to perform, and the query values
(user_id
and my_id
) specify parameters
for that operation.
The HTTP GET method is used for requests that retrieve information
from the server without making any changes, such as
/followers/ids.json
, which returns ids for all of the
users who are following a particular user. Requests that modify
state on the server, such as /friendships/create
use the POST method in HTTP. In these requests, parameters can be
specified either as query values in the URL, or in the body of the
request. If parameters are specified in the body, they are encoded
using the same notation as query values in the URL
(e.g., my_id=100&user_id=200
).
Note that if an input parameter contains any characters other than
letters, digits, hyphen (-), underscore (_), period (.), or tilde
(~), those characters will be escaped using URL encoding. See the
lecture notes for details on URL encoding. For example if the
message
parameter for a tweet is actually
"I'm on my way home", it will be encoded in the URL like this:
message=I%27m%20on%20my%20way%20home
Your Tweeter code must properly decode these values to extract the original text.
Tweeter returns information back to applications using JSON
format. See the lecture notes for details on the format of
JSON objects, and see the descriptions of individual requests below
for details on the specific values returned by each request.
The JSON is returned as the body of the HTTP result; the response must
include a Content-type
header with value
of application/json
; this indicates to the recipient that
the response is encoded in JSON format.
Each JSON response consists of a single JSON object with zero or more
named properties. For example, here is a response containing a single
property named ids
, whose value is an array of user identifiers:
{"ids": [44, 99 307, 8216]}
If an error is detected while handling a request, such as a missing
parameter, the JSON response contains a single property whose name is
error
and whose value is a string describing the problem,
such as:
{"error": "missing parameter: user_id"}
Here are the specific URLs that your Tweeter sever must support, along with their parameters and results. Note that each request must use a specific HTTP method (GET or POST); it is an error for a request to use the wrong method type. These requests are very similar to the requests supported by the real Twitter Web service (https://dev.twitter.com/rest/public).
Method: POST
Parameters:
my_id | Identifier of a user that will become a follower. |
user_id | Identifier of a user that will be followed. |
Make user my_id
a follower of user
user_id
; if it was already a follower, leave it that way.
Returns an empty JSON object ({}
).
Method: POST
Parameters:
my_id | Identifier of the following user. |
user_id | Identifier of the followed user. |
If my_id
is currently a follower of user
user_id
, then delete that friendship; if my_id
is
not currently a follower of user_id
, do nothing.
Returns an empty JSON object ({}
).
Method: GET
Parameters:
user_id | Identifier for a user. |
Returns identifiers for all of the users who are followers of
user_id
. The identifiers are returned as an array in a
property named ids
. Here is an example result:
{"ids": [44, 99 307, 8216]}
Method: GET
Parameters:
user_id | Identifier for a user. |
Returns identifiers for all of the users who are friends of
user_id
(i.e., all of the users for whom user_id
is a follower. The identifiers are returned as an array in a
property named ids
. Here is an example result:
{"ids": [44, 99 307, 8216]}
Method: POST
Parameters:
my_id | Identifier of the user that created the tweet. |
status | The contents of the tweet (a message of no more than 140 characters). |
Create a new tweet for user my_id
with the given
message. The tweet must be assigned an identifier higher than the identifier
for any tweet created before this one.
Returns an empty JSON object ({}
).
Method: GET
Parameters:
my_id | Identifier for a user. |
count | Maximum number of tweets to return (optional: defaults to 20). |
max_id | Optional: if specified, the returned tweets will have ids no higher than this. |
Returns the most recent tweets (i.e. highest tweet ids) created by
my_id
and all of my_id
's friends,
subject to the count
and max_id
parameters. The tweets are returned as an array in a
property named tweets
, and each tweet is described
with four properties: id
(the tweet's identifier),
user
(the identifier for the user that created the
tweet), time
(the time when the tweet was created),
and text
(the contents of the
tweet message). The returned tweets must be in reverse chronological
order (most recent tweet first). Here is an example result:
{"tweets": [ {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014", "text": "On my way home"}, {"id": 20007, "user": 18, "time": "Mon Oct 27 17:13:22 PDT 2014", "text": "Chillin' by the pool"}, {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014", "text": "Just saw a flying saucer!"} ]}
Method: GET
Parameters:
my_id | Identifier for a user. |
count | Maximum number of tweets to return (optional: defaults to 20). |
max_id | Optional: if specified, the returned tweets will have ids no higher than this. |
This request is similar to statuses/home_timeline.json
except that only tweets created by user my_id
are returned
(my_id
's friends are not considered). Here is an example result:
{"tweets": [ {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014", "text": "On my way home"}, {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014", "text": "Just saw a flying saucer!"} ]}
Your server must support at least the following command-line options, which may be specified when the server is started in order to configure it:
-port p | Port number on which the server should listen for incoming requests (default: 80). |
-workspace path | Path to a directory that Tweeter can use to store its data in files. If this directory already contains information when Tweeter starts up, Tweeter should assume that this is old state left behind when a previous server crashed; the new server should use this information to initialize itself. |
-help | If this option is specified (no value needed), Tweeter should print out a help message describing the command-line arguments, then it should exit without doing anything else. |
Your server must provide durable storage for both tweets and friendship information, so that information is not lost if a server crashes and restarts. This means that you must store information in files on disk and reuse the saved information when a server restarts. You may assume that servers do not crash in the middle of of executing an operation; the only way a server crashes is for it to stop execution after completing an operation and returning the response. You may assume that the underlying operating system and storage device(s) are perfect and never crash or corrupt data.
It is up to you to decide how to represent information in files, but you must use ordinary files: do not use a database such as SQLite. You may find Java's serialization facilities useful for moving information to and from files.
The real Twitter service is implemented by clusters of hundreds or thousands of servers running in large datacenters, with request handling divided among the servers and special-purpose storage servers to make the data durable. For your Tweeter project you will use only a single server that stores all of the data locally. However, you must take reasonable steps in your design to handle a large workload:
Your implementation of Tweeter must satisfy the following requirements:
Socket
and
InputStreamReader
classes. If in doubt about using a
particular class, check with me.There are several different ways you can test your Tweeter implementation. One approach is to use a Web browser and type URLs into the URL bar. You may be able to use this for many things, but it will only generate GET HTTP requests.
Another option is to use the curl
command-line tool to issue requests. Curl
offers a large
number of arguments that can be used to invoke almost any imaginable
HTTP request. Here are a few examples of curl
commands:
curl http://localhost:8080/friends/ids.json?user_id=100
This command issues a GET request for the specified URL and prints the body of the response.
curl --data "my_id=99&status=Simple%20update" http://localhost:8080/statuses/update
This command issues a POST request for the specified URL and includes the
value of the --data
option as the body of the request.
You must make sure that the value of the --data
option is
properly URL-encoded.
curl --data-urlencode "my_id=99&status=Simple update" http://localhost:8080/statuses/update
This example is similar to the previous one, except that curl
will automatically provide proper escaping for the value of the
--data
option.
I strongly recommend that you use logging to record information when
interesting events happen; this will make debugging much easier.
Logging can be as simple as printing messages
to standard output; if you are really ambitious you can learn how to
use log4j
, but that is not necessary for this class.
Real Web services
use logging extensively. In some cases they may log every single request,
or even multiple log messages per request. A good rule of thumb for
logging is to log everything you can possibly afford to log (you can't
afford to log something if it would make your log file too large to
store on disk, or if the logging requests would have a significant
impact on performance).
I recommend using the Eclipse development environment for the projects in this class, but if there is some other development environment that you prefer, that's fine too. Please configure your development environment so that indent widths are 4 spaces, and only space characters are stored in files, not tabs (this will make it easier for me to review your code: tab characters and/or 8-space indents result in very long lines in the code review tool).
Here's how you can configure Eclipse for this:
Here is a list of features to check in your project. I recommend going over this list several times as you design and build your project, to make sure you have included all the required elements.
You will submit your project by creating an issue on a Web-based code review tool. For this class we will use the Rietveld code review tool, which is hosted on Google App Engine (this tool has a few features that make it more convenient for this class than the GitHub code review facilities). Here is how to submit your project:
project1
for the commit that
you are submitting for the project. Push this tag to GitHub.python upload.py --rev XXX..project1
Once you have created the code review, check to make sure it is visible at cs190codereview.appspot.com.
If you are planning to use late days for this project (or any project) please send me an email before the project deadline so that I know your plans. Send me another email once you eventually upload your code review.