Monthly Archives: January 2011

Wordclouds and Space

Wordclouds are widely popular, yet space is generally used arbitrarily, or at least does not express any meaningful relationship that could provide insight into the data. Drew Conway recently not only suggested an alternative take on Wordclouds, but went right ahead and implemented it in R. The core idea is to visually compare two different bodies of texts and represent both, frequencies as well as differences in the use of words spatially. The word size represents frequency as before and the spreading over the x-axis represents differences.

I used the same code to compare the commencement speeches from Steve Jobs and Oprah Winfrey.

Obviously there is a lot of overlapping vocabulary. What's important to note in reading the graph is that in this case the word "life" is used often by both speakers, but somewhat more often by Steve Jobs.

There are certainly limits and drawbacks, both methodologically (it only represents words used by both parties) and visually (for example, overlapping words, large bodies of text). But the core idea, namely to take advantage of spatial arrangement as a device that adds meaning to a visualization is inspiring.