Strict Standards: Only variables should be passed by reference in /afs/ on line 289
Dan Knoepfle's Blog

Aquamacs 2.2 and ESS

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358

News of the latest release of Aquamacs, version 2.2, appeared this week in my echo area. Given the opportunity to procrastinate, I dropped everything and upgraded; returning to work, I noticed that the version of ESS shipped with Aquamacs 2.2 is ESS 5.8, released over a year ago. The latest ESS is 5.13, available from http://ess.r-project … ads/ess/ess-5.13.tgz; the easiest way to install is described by Simon Jackman here and elaborated below:

Unarchive ESS and navigate to the folder created; edit Makeconf to set the following:


Open, cd to the directory created when you extracted ESS (tip: drag the little folder icon from the top of the Finder window to copy the path). Then gmake install and the updated ESS will overwrite the old version inside the app package. Done! Hopefully this post will save someone the trouble of figuring out where ESS hides deep inside the package.

[Update (4/02/11)]: Martin Maechler was kind enough to add the above to the ESS Makeconf; for future versions of ESS, Aquamacs users should simply have to uncomment the appropriate lines in the Makeconf file.

MATLAB / R Reference

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358

Anyone with a MATLAB background interested in transitioning to R is advised to check out this MATLAB / R Reference by Professor David Hiebeler of the University of Maine.

Google Insights and RCurl

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358

Google Insights is nifty. If you’re logged in to your Google account, you can download the results as a CSV file. This is straightforward if you’re using a browser; if you’re trying to retrieve the results of queries using R, however, things get more complicated.

The following code retrieves the results of a Google Insights search for “Sarah Palin” as a data.frame. It uses the RCurl package to do all of the hard work.

username <- ""
password <- "password_here"

loginURL <- ""
authenticateURL <- ""


ch <- getCurlHandle()

curlSetOpt(curl = ch,
            ssl.verifypeer = FALSE,
            useragent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: Gecko/20101203 Firefox/3.6.13",
            timeout = 60,
            followlocation = TRUE,
            cookiejar = "./cookies",
            cookiefile = "./cookies")

## do Google Account login
loginPage <- getURL(loginURL, curl = ch)

galx.match <- str_extract(string = loginPage,
                          pattern ='name="GALX"\\s*value="([^"]+)"'))
galx <- str_replace(string = galx.match,
                    pattern ='name="GALX"\\s*value="([^"]+)"'),
                    replacement = "\\1")

authenticatePage <- postForm(authenticateURL, .params = list(Email = username, Passwd = password, GALX = galx), curl = ch)

## get Google Insights results CSV
insightsURL <- ""
resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

if(isTRUE(unname(attr(resultsText, "Content-Type")[1] == "text/csv"))) {
  ## got CSV file

  ## create temporary connection from results
  tt <- textConnection(resultsText)

  resultsCSV <- read.csv(tt, header = FALSE)

  ## close connection
} else {
  ## something went wrong

  ## probably need to log in again?


download ‘Google Insights.R’ from

I don’t have much else to say about this, but I hope that it will be helpful to someone.

You can change the query to incorporate geographic restrictions or such by adding the parameters that appear in the URL when you change your search through the Google Insights web search; for instance, a basic search for “QUERY” gives URL whereas the same search restricted to the state of New York has URL; the added parameter is “geo=US-NY”. To incorporate this into the script, change

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

to have the additional parameter in the .params list:

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", geo = "US-NY", content = 1, export = 1), curl = ch)

[Updated 2012-04-24]

How to buy a used car with R (part 2)

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358

Continued from Part 1.

Part 2: Digging into the Kelley Blue Book

The only thing better than a bit of data is a lot of data. Now that we can grab KBB values for a given trim of a given model in a given year, we set our ambitions higher: automating the collection of these values for all trims of a model over a set of years. To do so, let’s back up and recall how we got to the KBB results page:

Let’s suppose we’re still set on the Honda Accord and are considering the last ten model years. Going with “Search by: Year, Make & Model”, we get to the following self-explanatory screen:


Choosing (2005, Honda, Accord) pushes us to the following address: There, we are reminded that the KBB reports different values for retail, certified retail, private sellers, and trade-ins:


Let’s go with “Private Party Value” for now; we end up at We’re now presented with a plethora of different trims, enough to make us nostalgic for Henry Ford:


Start with the “DX Sedan 4D”. We arrive at If the previous screen didn’t freak us out, this one definitely should—-but if we ignore the options at the bottom (which are set to their standard values for the given model year and trim), we’re left with the important parameters: the choice of automatic or manual transmission and the mileage (and the ZIP code, which I’ll discuss later).

I can’t drive stick, so I’m not particularly worried about changing the transmission from its default of Automatic. But if you wanted to, note that choosing Automatic with default options and 10,000 miles pushes you to whereas choosing Manual, 5-Spd with the same options and mileage gives|true&mileage=10000.


Either way, we end up at a completely pointless page: no matter what you select, the results page gives values for all conditions.


Say we select “Good”. The results page for the Automatic is located at and the results page for the Manual, 5-Spd is located at|true&mileage=10000. If we want, we can tear off the “condition” field, in which case the default condition, Excellent, is highlighted.

So, if we want to grab results for a bunch of different years and trims, we need to figure out the id=846 part of the URL (and possibly the equipment=35014|true part if we’re after a manual transmission). Again, it’s time for Firebug. Back up to the trim selection page at and load up Firebug. If we examine the links for the various trims, we see that the links for the available trims are contained within a div with id='UCPathTrim'.


The next step is to write some R code to parse the trim selection page and pull out the available trims and their corresponding id values. This will make use of some of the core functionality of the XML package.

The XML package and HTML documents

In the last post, we used the function readHTMLTable from the XML package to read the results from a webpage into an R data.frame. At the time, there was little mention of the technical details; now, we’re moving beyond convenient functions and into the great unknown.

The XML package, written by Professor Duncan Temple Lang of UC Davis, is a wrapper for libxml2. The package website, hosted by The Omega Project for Statistical Computing, is at, and the package listing on CRAN is located at

At its core, the XML package is meant for parsing XML and HTML documents into tree structures and selecting and extracting or otherwise manipulating branches or nodes of the trees. Take a look at the HTML tab of Firebug again (on, and note that the webpage consists of a tree of HTML tags. At its root, there’s a html node, with children head and body; within the body branch are nodes defining the structure of the document, including a branch descending from a div node (<div class="modCBox UCPathModule" id="UCPathTrim">) containing a branch descending from a span node (<span class="sectContent">) with leaf nodes like <a class="link_circle_arrow_blue" href="/used-cars/honda/accord/2005/private-party-value/equipment?id=846"> Accord DX Sedan 4D</a>.

Now, moving to R, we’ll look at the tree produced by the XML package for this document. The first section of code should be fairly straightforward:

## download the webpage
kbbHTML <- readLines("")

## load the XML package and parse the downloaded document
kbbTree <- htmlTreeParse(kbbHTML, asText = TRUE)

## get the root ('html') node
kbbRoot <- xmlRoot(kbbTree)

Each node object (class XMLNode) is also a list containing its immediate children as node objects.

> ## print the child nodes ('head' and 'body')
> print(summary(kbbRoot))
     Length Class   Mode
head 14     XMLNode list
body 19     XMLNode list

Thus, we can get the body of the document:

## select the 'body' child node using the usual R list element extraction syntax
kbbBody <- kbbRoot[["body"]]

Within the body, there’s a bunch of child nodes (the same ones we see in Firebug, of course):

> ## print the child nodes of the 'body'
> print(summary(kbbBody))
         Length Class          Mode
script   1      XMLNode        list
script   1      XMLNode        list
div      4      XMLNode        list
comment  0      XMLCommentNode list
script   0      XMLNode        list
script   1      XMLNode        list
script   1      XMLNode        list
script   1      XMLNode        list
noscript 1      XMLNode        list
comment  0      XMLCommentNode list
comment  0      XMLCommentNode list
script   0      XMLNode        list
div      2      XMLNode        list
script   0      XMLNode        list
script   1      XMLNode        list
comment  0      XMLCommentNode list
script   1      XMLNode        list
noscript 1      XMLNode        list
comment  0      XMLCommentNode list

Either by looking at the tree in Firebug or using summaries of the tree in R, we can identify the div node we’re looking for and access the corresponding node object in R:

## select our 'div id="UCPathTrim"...' node; instead of using node
## names (like 'div'), which aren't necessarily unique here, we use
## indices (we want the first child of the first child of the second
## child of the second child of the third child of 'body')
divUCPathTrim <- kbbBody[[3]][[2]][[2]][[1]][[1]]
> ## print the child nodes
> print(summary(divUCPathTrim))
     Length Class       Mode
h2   1      XMLNode     list
text 0      XMLTextNode list
span 9      XMLNode     list

We can then access the trim links, which are the leaf nodes of the span node under divUCPathTrim. Printing an XMLNode object outputs the raw HTML.

> ## print the HTML of the first of the link leaf nodes (children of the 'span' node)
> print(divUCPathTrim[["span"]][[1]])
<a href="/used-cars/honda/accord/2005/private-party-value/equipment?id=846" class="link_circle_arrow_blue">Accord DX Sedan 4D</a>

To get the node contents (here, the trim label), we use the xmlValue function:

> ## print the *contents* of this leaf node
> print(xmlValue(divUCPathTrim[["span"]][[1]]))
[1] "Accord DX Sedan 4D"

To get the link target (the ‘href’ attribute), we use the xmlAttrs function:

> ## print the 'href' attribute of this leaf node
> print(xmlAttrs(divUCPathTrim[["span"]][[1]])[["href"]])
[1] "/used-cars/honda/accord/2005/private-party-value/equipment?id=846"

There’s an easier way to select a set of nodes and apply functions over this set. To do so, we must learn a bit of XPath.


XPath is a query language for selecting sets of nodes from XML or XML-like documents (like HTML webpages). A nice quick introduction to XPath syntax is the article XPath Syntax. Open it in a tab, read it, and come back.

Done? Good. If we’re super lazy, we can use Firebug to generate an XPath expression to select a given node—just right click on the node and choose “Copy XPath”. Here’s the XPath expression for the second of the nine trim links:


To select all of the nine trim links, we simply chop off the “[2]” on the end (match all a nodes that are children of that span):


If we want a short XPath expression, we can instead use something like this:

//div[@id = 'UCPathTrim']//a

That is, we select all a nodes that descend from any div node with attribute id='UCPathTrim'. In XPath syntax, “//nodename” selects descendant nodes named nodename while “/nodename” selects child nodes named nodename (immediate descendants). Using double forward slashes allows us to skip specifying intermediate nodes. Expressions within brackets are conditions, evaluated to booleans, specifying whether a node should or should not be included.

Is there any advantage to using one expression over the other? So long as the structure of the webpage doesn’t change, both will work; however, if the order of the nodes in the document changes, the former expression will fail, but the latter will continue to work (it selects on the div id attribute rather than its position in the document). Similarly, if the div id changes but the document structure otherwise remains unchanged (this is unlikely, but might happen if they messed around with their CSS styling or something), the former would continue working but the latter would fail.

We can create a fancier XPath expression using XPath functions that will continue to work so long as the KBB URL scheme stays the same. Since the rest of the code will depend on this remaining constant, our XPath expression should only fail at the same time as the rest of our code. A list of XPath functions can be found here. We’ll use the function contains(x, y), which returns true if string x contains string y (else false). Our XPath expression is:

//a[contains(@href, 'used-cars/honda/accord/2005/private-party-value/equipment')]

This selects all links with target URLs containing ‘used-cars/honda/accord/2005/private-party-value/equipment’.

getNodeSet and xpathApply

To use XPath with the XML package, we need to parse the document a little differently. You see, the XML package can either parse the document into a tree structure of R objects (as we did above, using htmlTreeParse) or into a tree structure of pointers to C-level objects. In the latter case, the parsed structure is maintained as lower-level objects in memory, and is not immediately accessible in R. Indeed, incorrectly accessing the parsed document object can cause R to crash. However, parsing the document into this C-level structure internal to libxml2 permits the use of XPath expressions. For more, do help("xmlParse").

In practice, using XPath expressions with the XML package is fairly simple. We parse the document with htmlParse instead of htmlTreeParse, and select sets of nodes corresponding to XPath expressions using getNodeSet. We can then lapply or sapply over the resulting nodeset. If we only need to apply a single function, we can instead use xpathApply to apply a function to an XPath-defined set directly.

## parse the downloaded document to an XMLInternalDocument
kbbInternalTree <- htmlParse(kbbHTML, asText = TRUE)

## select nodes matching our XPath expression
xpath.expression <- "//a[contains(@href,'/used-cars/honda/accord/2005/private-party-value/equipment')]"
trim.nodes <- getNodeSet(doc = kbbInternalTree,
                         path = xpath.expression)
> ## the result is of class "XMLNodeSet", a list of 9 externalptr
> ## objects of class "XMLInternalElementNode"
> print(summary(trim.nodes))
      Length Class                  Mode       
 [1,] 1      XMLInternalElementNode externalptr
 [2,] 1      XMLInternalElementNode externalptr
 [3,] 1      XMLInternalElementNode externalptr
 [4,] 1      XMLInternalElementNode externalptr
 [5,] 1      XMLInternalElementNode externalptr
 [6,] 1      XMLInternalElementNode externalptr
 [7,] 1      XMLInternalElementNode externalptr
 [8,] 1      XMLInternalElementNode externalptr
 [9,] 1      XMLInternalElementNode externalptr
> ## we can now lapply or sapply over this list object
> print(lapply(trim.nodes, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]])))
[1] " Accord DX Sedan 4D"                                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=846"

[1] " Accord EX Coupe 2D"                                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=863"

[1] " Accord EX Sedan 4D"                                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=859"

[1] " Accord EX-L Coupe 2D"                                               
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=263736"

[1] " Accord EX-L Sedan 4D"                                               
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=263737"

[1] " Accord Hybrid Sedan 4D"                                          
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=868"

[1] " Accord LX Coupe 2D"                                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=856"

[1] " Accord LX Sedan 4D"                                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=850"

[1] " Accord LX Special Edition Coupe 2D"                              
[2] "/used-cars/honda/accord/2005/private-party-value/equipment?id=867"

Putting it all together

I’m getting tired, so let’s jump ahead to a complete function that retrieves all of the trims for a given year. If you’ve read and understood everything above, you should be able to figure out how the function works without much trouble (with the possible exception of the XPath expression, which needlessly uses regular expressions). Go wild with help(...) until it all makes sense.

getKBBYearTrims <- function(prefix, year, type = "private-party-value") {

  kbbTrimPageURL <- sprintf("%s%i/%s", prefix, year, type)
  cat("Loading", kbbTrimPageURL, "\n")

  x <- readLines(kbbTrimPageURL)
  g <- htmlParse(x, asText=TRUE)

  xpath <- gsub("([http:/w.]+kbb\\.com/)(.*)", "//a[contains(@href, '\\2/equipment')]", kbbTrimPageURL)
  cat("XPath expression is:", xpath, "\n")

  trims <- getNodeSet(doc = g, path = xpath)
  trimlabels <- sapply(trims, xmlValue)
  trimids <- sapply(trims, function(node) sub(".*id=([[:digit:]]+)$", "\\1", xmlAttrs(node)[["href"]]))

  trimtable <- data.frame(year = year,
                          trim = trimlabels,
                          id = trimids,
                          stringsAsFactors = FALSE)

The function works great for 2005 Accords:

> ## print trims and ids for 2005 Honda Accords
> print(getKBBYearTrims(prefix = "", year = 2005))
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2005/private-party-value/equipment')] 
  year                                trim     id
1 2005                  Accord DX Sedan 4D    846
2 2005                  Accord EX Coupe 2D    863
3 2005                  Accord EX Sedan 4D    859
4 2005                Accord EX-L Coupe 2D 263736
5 2005                Accord EX-L Sedan 4D 263737
6 2005              Accord Hybrid Sedan 4D    868
7 2005                  Accord LX Coupe 2D    856
8 2005                  Accord LX Sedan 4D    850
9 2005  Accord LX Special Edition Coupe 2D    867

The following function wraps getKBBYearTrims to return a data.frame of trims for a set of model years.

getKBBTrims <- function(prefix, years, type = "private-party-value") {

  kbbTrimList <- lapply(years, function(year) getKBBYearTrims(prefix, year))
  kbbTrims <-'rbind', kbbTrimList)


Using it, we can try getting the trims for a series of model years:

> ## print trims and ids for years 2003 to 2007
> accord.trims <- getKBBTrims(prefix = "", years = 2003:2007)
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2003/private-party-value/equipment')] 
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2004/private-party-value/equipment')] 
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2005/private-party-value/equipment')] 
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2006/private-party-value/equipment')] 
XPath expression is: //a[contains(@href, 'used-cars/honda/accord/2007/private-party-value/equipment')]
> print(accord.trims)
   year                                trim     id
1  2003                  Accord DX Sedan 4D   2488
2  2003                  Accord EX Coupe 2D   2496
3  2003                  Accord EX Sedan 4D   2498
4  2003                Accord EX-L Coupe 2D 263731
5  2003                Accord EX-L Sedan 4D 263730
6  2003                  Accord LX Coupe 2D   2495
7  2003                  Accord LX Sedan 4D   2492
8  2004                  Accord DX Sedan 4D   2664
9  2004                  Accord EX Coupe 2D   2671
10 2004                  Accord EX Sedan 4D   2676
11 2004                Accord EX-L Coupe 2D 263735
12 2004                Accord EX-L Sedan 4D 263734
13 2004                  Accord LX Coupe 2D   2669
14 2004                  Accord LX Sedan 4D   2663
15 2005                  Accord DX Sedan 4D    846
16 2005                  Accord EX Coupe 2D    863
17 2005                  Accord EX Sedan 4D    859
18 2005                Accord EX-L Coupe 2D 263736
19 2005                Accord EX-L Sedan 4D 263737
20 2005              Accord Hybrid Sedan 4D    868
21 2005                  Accord LX Coupe 2D    856
22 2005                  Accord LX Sedan 4D    850
23 2005  Accord LX Special Edition Coupe 2D    867
24 2006                  Accord EX Coupe 2D    741
25 2006                  Accord EX Sedan 4D    739
26 2006                Accord EX-L Coupe 2D 263727
27 2006                Accord EX-L Sedan 4D 263726
28 2006              Accord Hybrid Sedan 4D    744
29 2006                  Accord LX Coupe 2D    736
30 2006                  Accord LX Sedan 4D    734
31 2006                  Accord SE Sedan 4D    738
32 2006                  Accord VP Sedan 4D    737
33 2007                  Accord EX Coupe 2D  83835
34 2007                  Accord EX Sedan 4D  83834
35 2007                Accord EX-L Coupe 2D 263674
36 2007                Accord EX-L Sedan 4D 263675
37 2007              Accord Hybrid Sedan 4D  83836
38 2007                  Accord LX Coupe 2D  83833
39 2007                  Accord LX Sedan 4D  83829
40 2007                  Accord SE Sedan 4D  83832
41 2007                  Accord VP Sedan 4D  83827

Everything works great. What a shock.

How to buy a used car with R (part 1)

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358

I’m in the process of buying a used car. Since I enjoy making these decisions as complicated as possible, I’ve written some R code to scrape relevant websites for informative data. I’ve written this up as a blog entry because I think it’s a decent example of how one might use the XML package and Firebug to quickly and easily bring data from websites into R.

Part 1: Scraping the surface of the Kelley Blue Book

In the past, the first resource a used car buyer looking for price information might have turned to was the Kelley Blue Book; now, this information is available for free at


Finding the data with Firebug

For now, I’m going to skip ahead to the page containing the kind of information that we want; later, I’ll back up and go through the process of getting to that page and detail how I wrote some simple functions automating queries for different parameters.

Here’s, giving the KBB private party value for a 2005 Honda Accord DX Sedan with automatic transmission, standard options, and 10,000 miles:


To get at the data we want, we need to identify where it is located in the structure of the page. While one can do this by simply reading the HTML source code, Firebug makes things much simpler. Load up Firebug and go to the HTML tab. Click the Inspect Element button (or go to the Firebug menu and choose Inspect Element); as you mouse-over elements on the page, you’ll notice that the corresponding tag in the HTML element tree is opened and highlighted. In the screenshot below, I’ve clicked on the value for the Excellent condition:


Examining the HTML tree in the Firebug display, we can see that all of the information we’re interested in is contained in a table with id ‘priceCondition’. Similarly, if you’re using Google Chrome, you can accomplish the same thing with the Developer Tools. Below, Firefox is on the left and Chrome is on the right:

kbb10.png kbb11.png

Parsing the web with the XML package

The XML package includes a convenient function called readHTMLTable to grab the data from the table we identified earlier. We can simply give it the URL of the page and it returns a list containing each of the page’s tables as an R object (converting them to data.frame by default).

kbbURL <- ""
kbbTables <- readHTMLTable(kbbURL)

With this minimal amount of effort, we’re most of the way to what we’re after:

> print(kbbTables)
  Condition\r\n                \r\n   Value
2                         Excellent $12,340
3                              Good $11,665
4                              Fair $10,565

By explicitly specifying the header, skipping the first two rows, and extracting the ‘priceCondition’ data.frame itself, we’re left with the raw data we are interested in:

kbbTable <- readHTMLTable(doc = kbbURL,
                          header = c("Condition","Value"),
                          skip.rows = c(1,2))[["priceCondition"]]
> print(kbbTable)
  Condition   Value
1 Excellent $12,340
2      Good $11,665
3      Fair $10,565

Now, if we take a look at the URL we’re using,, it should be apparent that fetching these values for any given mileage won’t be any trouble. The following code gets the KBB values for 10,000 mile increments from 10,000 to 150,000 miles:

kbbURLPrefix <- ""
kbbValuesList <- lapply(seq(10000,150000,by=10000), function(m) {
  readHTMLTable(doc = paste(kbbURLPrefix,m,sep=""),
                header = c("Condition","Value"),
                skip.rows = c(1,2))[["priceCondition"]]
> length(kbbValuesList)
[1] 15
> head(kbbValuesList,2)
  Condition   Value
1 Excellent $12,340
2      Good $11,665
3      Fair $10,565

  Condition   Value
1 Excellent $11,965
2      Good $11,290
3      Fair $10,190

Finally, we can convert the list into one big data.frame and augment it with the corresponding mileages and the model year. This leaves us with a nice data.frame from which we can extract whatever information we desire.

kbbValues <-'rbind', kbbValuesList)
kbbValues$Mileage <- rep(seq(10000,150000,by=10000), each = 3)
kbbValues$Year <- 2005
> head(kbbValues)
  Condition   Value Mileage Year
1 Excellent $12,340   10000 2005
2      Good $11,665   10000 2005
3      Fair $10,565   10000 2005
4 Excellent $11,965   20000 2005
5      Good $11,290   20000 2005
6      Fair $10,190   20000 2005
> print(kbbValues[which(kbbValues$Condition == "Excellent"),c("Mileage","Value")])
   Mileage   Value
1    10000 $12,340
4    20000 $11,965
7    30000 $11,565
10   40000 $11,140
13   50000 $10,740
16   60000 $10,265
19   70000  $9,740
22   80000  $9,190
25   90000  $8,640
28  100000  $9,440
31  110000  $7,640
34  120000  $7,190
37  130000  $6,765
40  140000  $6,190
43  150000  $5,965

Graphing our results with ggplot

Our last trick for the day is a simple one: take the data and make a pretty picture. Having collected the KBB values for different conditions and mileages, it is straightforward to construct a plot of value versus mileage for each condition.

First, however, we need to convert the kbbValues$Value column from its current human-readable state (a factor with levels like “$10,265”) into a more natural form for analysis. A quick bit of regular expressions magic using gsub does the trick, and we’re left with a nice column of numbers:

kbbValues$Value <- as.numeric(gsub("[$,]","",kbbValues$Value))
> kbbValues$Value
 [1] 12340 11665 10565 11965 11290 10190 11565 10890  9790 11140 10465  9365
[13] 10740 10065  8965 10265  9590  8490  9740  9065  7965  9190  8515  7415
[25]  8640  7965  6865  9440  8765  7665  7640  6965  5865  7190  6515  5415
[37]  6765  6090  4990  6190  5515  4415  5965  5290  4190

Use of ggplot is a subject best left for another time. Here, it’s as simple as:

ggplot(kbbValues, aes(x = Mileage, y = Value, color = Condition, group = Condition)) + geom_line()

This gives us the following beautiful plot:


Wait, what?

So, where did that peak at 100,000 miles come from?

Well, looking back, it’s clear that it’s present in the raw data in kbbValues. If we check the original page (, however, the values don’t match. What happened?

The culprit, and a correction, are in the code below:

kbbURLPrefix <- ""
kbbValuesList <- lapply(seq(10000,150000,by=10000), function(m) {
  currentURL <- sprintf("%s%i",kbbURLPrefix,m)
  cat(currentURL,"\n") # print debug info so we catch these errors!
  readHTMLTable(doc = currentURL,
                # The following converts m to character using as.character,
                # but as.character(100000) returns "1e+05"
                # doc = paste(kbbURLPrefix,m,sep=""),
                header = c("Condition","Value"),
                skip.rows = c(1,2))[["priceCondition"]]

Using the corrected procedure, we are rewarded with a nice, smooth graph: kbb-gg2.png

Strict Standards: Non-static method StringParser_Node::destroyNode() should not be called statically, assuming $this from incompatible context in /afs/ on line 358


I’m a grad student in economics at Stanford University. My papers can be found at my academic website.

I’ve disabled comments on the blog but welcome any thoughts you might have; you can reach me by email at [my last name]

Site RSS feed 


Strict Standards: Only variables should be passed by reference in /afs/ on line 33

Strict Standards: Only variables should be passed by reference in /afs/ on line 33

Strict Standards: Only variables should be passed by reference in /afs/ on line 33