HierRegressions

From FarmShare

(Difference between revisions)
Jump to: navigation, search
Line 2: Line 2:
This Hierarchical Linear Model example walks through http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc as implemented on FarmShare.
This Hierarchical Linear Model example walks through http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc as implemented on FarmShare.
 +
 +
Linear regression probably is the most familiar technique of data analysis, but its application is often hamstrung by model assumptions. For instance, if the data has a hierarchical structure, quite often the assumptions of linear regression are feasible only at local levels. We will investigate an extension of the linear model to bi-level hierarchies.
 +
 +
Example
 +
We borrow an example from Rossi, Allenby and McCulloch (2005) for demonstration. It is based upon a data set called ’cheese’ from the baysem package. The data set contains marketing data of certain brand name processed cheese, such as the weekly sales volume (VOLUME), unit retail price (PRICE), and display activity level (DISP) in various regional retailer accounts. For each account, we can define the following linear regression model of the log sales volume, where β1 is the intercept term, β2 is the display measure coefficient, and β3 is the log price coefficient.
The R libraries used here can be downloaded from: http://www.r-tutor.com/content/download
The R libraries used here can be downloaded from: http://www.r-tutor.com/content/download

Revision as of 12:00, 8 September 2013

Contents

Hierarchical Linear Regressions

This Hierarchical Linear Model example walks through http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc as implemented on FarmShare.

Linear regression probably is the most familiar technique of data analysis, but its application is often hamstrung by model assumptions. For instance, if the data has a hierarchical structure, quite often the assumptions of linear regression are feasible only at local levels. We will investigate an extension of the linear model to bi-level hierarchies.

Example We borrow an example from Rossi, Allenby and McCulloch (2005) for demonstration. It is based upon a data set called ’cheese’ from the baysem package. The data set contains marketing data of certain brand name processed cheese, such as the weekly sales volume (VOLUME), unit retail price (PRICE), and display activity level (DISP) in various regional retailer accounts. For each account, we can define the following linear regression model of the log sales volume, where β1 is the intercept term, β2 is the display measure coefficient, and β3 is the log price coefficient.

The R libraries used here can be downloaded from: http://www.r-tutor.com/content/download


installing rpud

Rpud comes in two parts. The first is a standard R library which can be installed the usual way, except we pass NVCC_OPT to instruct the code generator to produce code for the 2.0 level devices we are using. The second part is distributed as a binary blob which has an install.sh.

Here we get and install the first part into your home directory (under lib/R).

module load cuda
mkdir /tmp/$USER
cd /tmp/$USER
wget http://www.r-tutor.com/sites/default/files/rpud/rpud_0.3.3.tar.gz
mkdir ~/lib/R
export R_LIBS_USER="${HOME}/lib/R"
NVCC_OPT="-arch compute_20" R CMD INSTALL --library=$R_LIBS rpud_0.3.3.tar.gz

Here we get and install the second part:

module load cuda
cd /tmp/$USER
wget http://www.r-tutor.com/sites/default/files/rpud/rpudplus_0.3.3.tar.gz
export R_LIBS_USER="${HOME}/lib/R"
cd rpudplus_0.3.3
./install.sh


running on CPU

As a comparison, we will run on CPU using standard bayesm R library.

save following source code as rpudtestcpu.r

library(bayesm)

data("cheese")
str("cheese")

retailer <- levels(cheese$RETAILER)
nreg <- length(retailer); nreg

regdata <- NULL

for (i in 1:nreg) {
    filter <- cheese$RETAILER==retailer[i]
    y <- log(cheese$VOLUME[filter])
    X <- cbind(1,      # intercept placeholder
            cheese$DISP[filter],
            log(cheese$PRICE[filter]))
    regdata[[i]] <- list(y=y, X=X)
}

Data <- list(regdata=regdata)
Mcmc <- list(R=2000)

system.time(
outcpu <- bayesm::rhierLinearModel(
         Data=Data,
         Mcmc=Mcmc))

beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
beta.3cpu

Run it. In this case it took 42 seconds.

cp -p /farmshare/software/examples/R/hierregress/cheese.rda .
R --no-save < rpudtestcpu.r

you should see output similar to:

bishopj@rye01:~$ R --no-save < rpudtestcpu.r 

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(bayesm)
> 
> data("cheese")
> str("cheese")
 chr "cheese"
> 
> retailer <- levels(cheese$RETAILER)
> nreg <- length(retailer); nreg
[1] 88
> 
> regdata <- NULL
> 
> for (i in 1:nreg) {
+     filter <- cheese$RETAILER==retailer[i]
+     y <- log(cheese$VOLUME[filter])
+     X <- cbind(1,      # intercept placeholder
+             cheese$DISP[filter],
+             log(cheese$PRICE[filter]))
+     regdata[[i]] <- list(y=y, X=X)
+ }
> 
> Data <- list(regdata=regdata)
> Mcmc <- list(R=2000)
> 
> system.time(
+ outcpu <- bayesm::rhierLinearModel(
+          Data=Data,
+          Mcmc=Mcmc))
Z not specified -- putting in iota
 
Starting Gibbs Sampler for Linear Hierarchical Model
    88  Regressions
    1  Variables in Z (if 1, then only intercept)
 
Prior Parms: 
Deltabar
     [,1] [,2] [,3]
[1,]    0    0    0
A
     [,1]
[1,] 0.01
nu.e (d.f. parm for regression error variances)=  3
Vbeta ~ IW(nu,V)
nu =  6
V 
     [,1] [,2] [,3]
[1,]    6    0    0
[2,]    0    6    0
[3,]    0    0    6
 
MCMC parms: 
R=  2000  keep=  1
 
MCMC Iteration (est time to end - min) 
  100  ( 0.7 )
  200  ( 0.7 )
  300  ( 0.6 )
  400  ( 0.6 )
  500  ( 0.5 )
  600  ( 0.5 )
  700  ( 0.5 )
  800  ( 0.4 )
  900  ( 0.4 )
  1000  ( 0.4 )
  1100  ( 0.3 )
  1200  ( 0.3 )
  1300  ( 0.2 )
  1400  ( 0.2 )
  1500  ( 0.2 )
  1600  ( 0.1 )
  1700  ( 0.1 )
  1800  ( 0.1 )
  1900  ( 0 )
  2000  ( 0 )
  Total Time Elapsed:  0.71 
   user  system elapsed 
 42.288   0.036  42.448 
> 
> beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
> beta.3cpu
[1] -2.146799
>

running on GPU

save following source code as rpudtestgpu.r

library(bayesm)

data("cheese")
str("cheese")

retailer <- levels(cheese$RETAILER)
nreg <- length(retailer); nreg

regdata <- NULL

for (i in 1:nreg) {
    filter <- cheese$RETAILER==retailer[i]
    y <- log(cheese$VOLUME[filter])
    X <- cbind(1,      # intercept placeholder
            cheese$DISP[filter],
            log(cheese$PRICE[filter]))
    regdata[[i]] <- list(y=y, X=X)
}


Data <- list(regdata=regdata)
Mcmc <- list(R=2000)

library(rpud)    # load rpudplus

system.time(
outgpu <- rpud::rhierLinearModel(
	Data=Data,
	Mcmc=Mcmc,
	output="bayesm"))

beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
beta.3

Run it. For GPU it takes 0.35 seconds, down from 42.448 on the CPU alone.

bishopj@rye01:~$ module load cuda
bishopj@rye01:~$ export R_LIBS_USER=$HOME/lib/R
bishopj@rye01:~$ R --no-save < rpudtestgpu.r 

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(bayesm)
> 
> data("cheese")
> str("cheese")
 chr "cheese"
> 
> retailer <- levels(cheese$RETAILER)
> nreg <- length(retailer); nreg
[1] 88
> 
> regdata <- NULL
> 
> for (i in 1:nreg) {
+     filter <- cheese$RETAILER==retailer[i]
+     y <- log(cheese$VOLUME[filter])
+     X <- cbind(1,      # intercept placeholder
+             cheese$DISP[filter],
+             log(cheese$PRICE[filter]))
+     regdata[[i]] <- list(y=y, X=X)
+ }
> 
> 
> Data <- list(regdata=regdata)
> Mcmc <- list(R=2000)
> 
> library(rpud)    # load rpudplus
Rpudplus 0.3.3
http://www.r-tutor.com
Copyright (C) 2010-2013 Chi Yau. All Rights Reserved.
Rpudplus is free for academic use only. There is absolutely NO warranty.

Attaching package: 'rpud'

The following object(s) are masked from 'package:bayesm':

    rhierLinearModel, rmultireg

> 
> system.time(
+ outgpu <- rpud::rhierLinearModel(
+ 	Data=Data,
+ 	Mcmc=Mcmc,
+ 	output="bayesm"))
 
Starting Gibbs Sampler for Linear Hierarchical Model
    88  Regressions
    1  Variables in Z (if 1, then only intercept)
 
Prior Parms: 
===========  
Deltabar
     [,1] [,2] [,3]
[1,]    0    0    0
 
A
     [,1]
[1,] 0.01
 
nu.e (d.f. for regression error variances) = 3
 
Vbeta ~ IW(nu,V)
 
nu = 6
 
V 
     [,1] [,2] [,3]
[1,]    6    0    0
[2,]    0    6    0
[3,]    0    0    6
 
MCMC parms: 
==========  
R = 2000, keep = 1
 
....................
   user  system elapsed 
  0.192   0.156   0.350 
> 
> beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
> beta.3 
[1] -2.145573
>
Personal tools
Toolbox
LANGUAGES