Picture

Hi, I'm Alex Storer.

I work at the Stanford Graduate School of Business with researchers on data, visualization, analysis and computing. I like open source things and refined sugar.

Teaching programming

One of my tasks in the Winter Quarter is to co-teach a new MBA course with Guido Imbens entitled “Programming for Data Analysis” (OIT 537). There are a lot of useful tools that folks use to do data analysis, but in this compressed course, we wanted to stay focused on tools that can be quickly installed, and provide a pleasant learning environment.

RStudio saves the day

RStudio is a better way to program in R for beginners than anything that currently exists for Python. Plus, it’s cross-platform and open source, and the installation process is easy. Even though I’d much rather code in Python than R, if I’m teaching, I can’t waste my time with Anaconda or iPython notebook or Emacs or the tools that I use myself. They’re just too tough to set up, and too esoteric for your average student. RStudio is a nice balance.

What about SQL?

I’m not very good at SQL, but I’m aware that it’s a very powerful way to do some high level manipulations with data. Unfortunately, there’s not really a “SQL Studio” that I know of that provides that easy, cross-platform learning environment, particularly for datasets of the size that motivate learning something other than Excel.

Right now, I’m leaning towards using SQLite, due to its easy set-up time, but that’s like saying I’m using R instead of Python.

What IDE will make it easy to learn SQL?

In the worst case, I can always use SQL from within R, but that won’t do helpful things like syntax highlighting and auto-completion.

What does Hadley say about R and SQL?

Hadley Wickham, the driving force behind ggplot, reshape2, plyr, shiny and even RStudio, has written about integrating R with databases. He recommends and SQLite for starting out. He also makes some good suggestions about best practices for integrating the two.