Tag Archives: statistics

Twitter for Academics?

While the discussion about Twitter for Academics is not new, it tends to focus on individuals or on its use in teaching. Here I will be looking at on particular feature of Twitter, the hashtag, which is used to organize information related to a specific topic.

For a while I have been subscribing to the RSS feed for #rstats, which is the hashtag being used to label tweets related to R, a free software environment for statistical computing, which comes with a lot of good reasons to be used in academic research. According to my RSS feed reader there have been more than 1380 tweets since Jan 1, 2010 - but how useful is this information really? Might there be a way to determine the quality?

What a better way then using R itself to try and look at the posts.

For thisĀ  I extended a function in Jeff Gentry's twitteR package (yes, you can tweet from within R). Taking advantage of the Twitter search API I downloaded about 250 posts.

The following barchart breaks down the number of tweets per username. Maybe not surprisingly, and similar to a pattern often found on listservs, lot of users have only few posts, while few that post several times. Looking at the 50% quantile, shows us that the break is at user "jeromyanglim", so there are 11 out of 104 (10.6%) users responsible for half of the posts. If we trust that those users make meaningful contributions we can assume that at least 50% of the messages are useful.

Tweets with #rstats hashtag broken down by username

Could the inclusion of links in a tweet be considered as indicator that the post is useful? The following breaks down the tweets accordingly. Assuming posts that contain links may point to other useful information that would indicate that almost two thirds of the posts could be of potential interest.

Percentage of tweets that contain links

Might the addition of certain other hashtags that themselves are indicative of useful other lists be an indication of value? In the following we find a rough breakdown of the hashtags used in the posts, (not cleaned up for different spellings). In addition to #statistics, we find #hadoop, #textmate, #sna, #SAS, #HPC among the most listed, but yes we do find #BeeR as well.

Hashtags used in #rstats posts

The deficiencies and limitations of this quick-and-dirty analysis are evident. However, in the future it might not be too hard to come up with some rough statistical indicators for the information value of tweets. After all, it may help make Twitter a more useful tool for academics.

If you want to try this yourself, here is my code.