Google Insights and RCurl


Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ir.stanford.edu/users/k/n/knoepfle/cgi-bin/flatpress/fp-plugins/bbcode/inc/stringparser.class.php on line 358

Google Insights is nifty. If you’re logged in to your Google account, you can download the results as a CSV file. This is straightforward if you’re using a browser; if you’re trying to retrieve the results of queries using R, however, things get more complicated.

The following code retrieves the results of a Google Insights search for “Sarah Palin” as a data.frame. It uses the RCurl package to do all of the hard work.

username <- "username@gmail.com"
password <- "password_here"

loginURL <- "https://accounts.google.com/accounts/ServiceLogin"
authenticateURL <- "https://accounts.google.com/accounts/ServiceLoginAuth"

require(RCurl)

ch <- getCurlHandle()

curlSetOpt(curl = ch,
            ssl.verifypeer = FALSE,
            useragent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13",
            timeout = 60,
            followlocation = TRUE,
            cookiejar = "./cookies",
            cookiefile = "./cookies")


## do Google Account login
loginPage <- getURL(loginURL, curl = ch)

require(stringr)
galx.match <- str_extract(string = loginPage,
                          pattern = ignore.case('name="GALX"\\s*value="([^"]+)"'))
galx <- str_replace(string = galx.match,
                    pattern = ignore.case('name="GALX"\\s*value="([^"]+)"'),
                    replacement = "\\1")

authenticatePage <- postForm(authenticateURL, .params = list(Email = username, Passwd = password, GALX = galx), curl = ch)


## get Google Insights results CSV
insightsURL <- "http://www.google.com/insights/search/overviewReport"
resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

if(isTRUE(unname(attr(resultsText, "Content-Type")[1] == "text/csv"))) {
  ## got CSV file

  ## create temporary connection from results
  tt <- textConnection(resultsText)

  resultsCSV <- read.csv(tt, header = FALSE)

  ## close connection
  close(tt)
} else {
  ## something went wrong

  ## probably need to log in again?

}

download ‘Google Insights.R’ from gist.github.com

I don’t have much else to say about this, but I hope that it will be helpful to someone.

You can change the query to incorporate geographic restrictions or such by adding the parameters that appear in the URL when you change your search through the Google Insights web search; for instance, a basic search for “QUERY” gives URL http://www.google.com/insights/search/#q=QUERY&cmpt=q whereas the same search restricted to the state of New York has URL http://www.google.com/insights/search/#q=QUERY&geo=US-NY&cmpt=q; the added parameter is “geo=US-NY”. To incorporate this into the script, change

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

to have the additional parameter in the .params list:

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", geo = "US-NY", content = 1, export = 1), curl = ch)

[Updated 2012-04-24]


Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ir.stanford.edu/users/k/n/knoepfle/cgi-bin/flatpress/fp-plugins/bbcode/inc/stringparser.class.php on line 358

About

I’m a grad student in economics at Stanford University. My papers can be found at my academic website.

I’ve disabled comments on the blog but welcome any thoughts you might have; you can reach me by email at [my last name]@stanford.edu

Site RSS feed 

Categories


Strict Standards: Only variables should be passed by reference in /afs/ir.stanford.edu/users/k/n/knoepfle/cgi-bin/flatpress/fp-includes/smarty/plugins/function.list_categories.php on line 33

Strict Standards: Only variables should be passed by reference in /afs/ir.stanford.edu/users/k/n/knoepfle/cgi-bin/flatpress/fp-includes/smarty/plugins/function.list_categories.php on line 33

Strict Standards: Only variables should be passed by reference in /afs/ir.stanford.edu/users/k/n/knoepfle/cgi-bin/flatpress/fp-includes/smarty/plugins/function.list_categories.php on line 33