HierRegressions

From FarmShare

(Difference between revisions)
Jump to: navigation, search
 
(9 intermediate revisions not shown)
Line 1: Line 1:
-
== Hierarchical Linear Regressions ==
+
== Hierarchical Linear Regressions using R and CUDA ==
 +
This Hierarchical Linear Model example walks through http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc as implemented on FarmShare.
-
http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc
+
In particular, this example is based on an example from Rossi, Allenby and McCulloch (2005) . It is based upon a data set called ’cheese’ from the baysem package. The data set contains marketing data of certain brand name processed cheese, such as the weekly sales volume (VOLUME), unit retail price (PRICE), and display activity level (DISP) in various regional retailer accounts. For each account, we can define the following linear regression model of the log sales volume, where β1 is the intercept term, β2 is the display measure coefficient, and β3 is the log price coefficient.
-
http://www.r-tutor.com/content/download
+
The R libraries used here can be downloaded from: http://www.r-tutor.com/content/download
== installing rpud ==
== installing rpud ==
 +
 +
Rpud comes in two parts.  The first is a standard R library which can be installed the usual way, except we pass NVCC_OPT to instruct the code generator to produce code for the 2.0 level devices we are using.  The second part is distributed as a binary blob which has an install.sh.
 +
 +
Here we get and install the first part into your home directory (under lib/R).
<source lang="sh">
<source lang="sh">
Line 18: Line 23:
NVCC_OPT="-arch compute_20" R CMD INSTALL --library=$R_LIBS rpud_0.3.3.tar.gz
NVCC_OPT="-arch compute_20" R CMD INSTALL --library=$R_LIBS rpud_0.3.3.tar.gz
</source>
</source>
 +
 +
Here we get and install the second part:
<source lang="sh">
<source lang="sh">
module load cuda
module load cuda
-
export R_LIBS_USER="${HOME}/lib/R"
+
cd /tmp/$USER
wget http://www.r-tutor.com/sites/default/files/rpud/rpudplus_0.3.3.tar.gz
wget http://www.r-tutor.com/sites/default/files/rpud/rpudplus_0.3.3.tar.gz
-
cd rpudplus_0.3.3/
+
export R_LIBS_USER="${HOME}/lib/R"
 +
cd rpudplus_0.3.3
./install.sh
./install.sh
</source>
</source>
-
/farmshare/software/examples/R/hierregress/cheese.rda
+
== running on CPU ==
 +
 
 +
As a comparison, we will run on CPU using standard bayesm R library.
 +
 
 +
save following source code as rpudtestcpu.r
 +
 
 +
<source lang="r">
 +
library(bayesm)
 +
 
 +
data("cheese")
 +
str("cheese")
 +
 
 +
retailer <- levels(cheese$RETAILER)
 +
nreg <- length(retailer); nreg
 +
 
 +
regdata <- NULL
 +
 
 +
for (i in 1:nreg) {
 +
    filter <- cheese$RETAILER==retailer[i]
 +
    y <- log(cheese$VOLUME[filter])
 +
    X <- cbind(1,      # intercept placeholder
 +
            cheese$DISP[filter],
 +
            log(cheese$PRICE[filter]))
 +
    regdata[[i]] <- list(y=y, X=X)
 +
}
 +
 
 +
Data <- list(regdata=regdata)
 +
Mcmc <- list(R=2000)
 +
 
 +
system.time(
 +
outcpu <- bayesm::rhierLinearModel(
 +
        Data=Data,
 +
        Mcmc=Mcmc))
 +
 
 +
beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
 +
beta.3cpu
 +
</source>
 +
 
 +
Run it.  In this case it took 42 seconds.
 +
 
 +
<source lang="sh">
 +
cp -p /farmshare/software/examples/R/hierregress/cheese.rda .
 +
R --no-save < rpudtestcpu.r
 +
</source>
 +
 
 +
you should see output similar to:
 +
 
 +
<source lang="sh">
 +
bishopj@rye01:~$ R --no-save < rpudtestcpu.r
 +
 
 +
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
 +
Copyright (C) 2012 The R Foundation for Statistical Computing
 +
ISBN 3-900051-07-0
 +
Platform: x86_64-pc-linux-gnu (64-bit)
 +
 
 +
R is free software and comes with ABSOLUTELY NO WARRANTY.
 +
You are welcome to redistribute it under certain conditions.
 +
Type 'license()' or 'licence()' for distribution details.
 +
 
 +
R is a collaborative project with many contributors.
 +
Type 'contributors()' for more information and
 +
'citation()' on how to cite R or R packages in publications.
 +
 
 +
Type 'demo()' for some demos, 'help()' for on-line help, or
 +
'help.start()' for an HTML browser interface to help.
 +
Type 'q()' to quit R.
 +
 
 +
[Previously saved workspace restored]
 +
 
 +
> library(bayesm)
 +
>
 +
> data("cheese")
 +
> str("cheese")
 +
chr "cheese"
 +
>
 +
> retailer <- levels(cheese$RETAILER)
 +
> nreg <- length(retailer); nreg
 +
[1] 88
 +
>
 +
> regdata <- NULL
 +
>
 +
> for (i in 1:nreg) {
 +
+    filter <- cheese$RETAILER==retailer[i]
 +
+    y <- log(cheese$VOLUME[filter])
 +
+    X <- cbind(1,      # intercept placeholder
 +
+            cheese$DISP[filter],
 +
+            log(cheese$PRICE[filter]))
 +
+    regdata[[i]] <- list(y=y, X=X)
 +
+ }
 +
>
 +
> Data <- list(regdata=regdata)
 +
> Mcmc <- list(R=2000)
 +
>
 +
> system.time(
 +
+ outcpu <- bayesm::rhierLinearModel(
 +
+          Data=Data,
 +
+          Mcmc=Mcmc))
 +
Z not specified -- putting in iota
 +
 +
Starting Gibbs Sampler for Linear Hierarchical Model
 +
    88  Regressions
 +
    1  Variables in Z (if 1, then only intercept)
 +
 +
Prior Parms:
 +
Deltabar
 +
    [,1] [,2] [,3]
 +
[1,]    0    0    0
 +
A
 +
    [,1]
 +
[1,] 0.01
 +
nu.e (d.f. parm for regression error variances)=  3
 +
Vbeta ~ IW(nu,V)
 +
nu =  6
 +
V
 +
    [,1] [,2] [,3]
 +
[1,]    6    0    0
 +
[2,]    0    6    0
 +
[3,]    0    0    6
 +
 +
MCMC parms:
 +
R=  2000  keep=  1
 +
 +
MCMC Iteration (est time to end - min)
 +
  100  ( 0.7 )
 +
  200  ( 0.7 )
 +
  300  ( 0.6 )
 +
  400  ( 0.6 )
 +
  500  ( 0.5 )
 +
  600  ( 0.5 )
 +
  700  ( 0.5 )
 +
  800  ( 0.4 )
 +
  900  ( 0.4 )
 +
  1000  ( 0.4 )
 +
  1100  ( 0.3 )
 +
  1200  ( 0.3 )
 +
  1300  ( 0.2 )
 +
  1400  ( 0.2 )
 +
  1500  ( 0.2 )
 +
  1600  ( 0.1 )
 +
  1700  ( 0.1 )
 +
  1800  ( 0.1 )
 +
  1900  ( 0 )
 +
  2000  ( 0 )
 +
  Total Time Elapsed:  0.71
 +
  user  system elapsed
 +
42.288  0.036  42.448
 +
>
 +
> beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
 +
> beta.3cpu
 +
[1] -2.146799
 +
>
 +
</source>
 +
 
 +
== running on GPU ==
 +
 
 +
save following source code as rpudtestgpu.r
 +
 
 +
<source lang="r">
 +
library(bayesm)
 +
 
 +
data("cheese")
 +
str("cheese")
 +
 
 +
retailer <- levels(cheese$RETAILER)
 +
nreg <- length(retailer); nreg
 +
 
 +
regdata <- NULL
 +
 
 +
for (i in 1:nreg) {
 +
    filter <- cheese$RETAILER==retailer[i]
 +
    y <- log(cheese$VOLUME[filter])
 +
    X <- cbind(1,      # intercept placeholder
 +
            cheese$DISP[filter],
 +
            log(cheese$PRICE[filter]))
 +
    regdata[[i]] <- list(y=y, X=X)
 +
}
 +
 
 +
 
 +
Data <- list(regdata=regdata)
 +
Mcmc <- list(R=2000)
 +
 
 +
library(rpud)    # load rpudplus
 +
 
 +
system.time(
 +
outgpu <- rpud::rhierLinearModel(
 +
Data=Data,
 +
Mcmc=Mcmc,
 +
output="bayesm"))
 +
 
 +
beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
 +
beta.3
 +
</source>
 +
 
 +
Run it.  For GPU it takes 0.35 seconds, down from 42.448 on the CPU alone.
 +
 
 +
<source lang="sh">
 +
bishopj@rye01:~$ module load cuda
 +
bishopj@rye01:~$ export R_LIBS_USER=$HOME/lib/R
 +
bishopj@rye01:~$ R --no-save < rpudtestgpu.r
 +
 
 +
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
 +
Copyright (C) 2012 The R Foundation for Statistical Computing
 +
ISBN 3-900051-07-0
 +
Platform: x86_64-pc-linux-gnu (64-bit)
 +
 
 +
R is free software and comes with ABSOLUTELY NO WARRANTY.
 +
You are welcome to redistribute it under certain conditions.
 +
Type 'license()' or 'licence()' for distribution details.
 +
 
 +
R is a collaborative project with many contributors.
 +
Type 'contributors()' for more information and
 +
'citation()' on how to cite R or R packages in publications.
 +
 
 +
Type 'demo()' for some demos, 'help()' for on-line help, or
 +
'help.start()' for an HTML browser interface to help.
 +
Type 'q()' to quit R.
 +
 
 +
[Previously saved workspace restored]
 +
 
 +
> library(bayesm)
 +
>
 +
> data("cheese")
 +
> str("cheese")
 +
chr "cheese"
 +
>
 +
> retailer <- levels(cheese$RETAILER)
 +
> nreg <- length(retailer); nreg
 +
[1] 88
 +
>
 +
> regdata <- NULL
 +
>
 +
> for (i in 1:nreg) {
 +
+    filter <- cheese$RETAILER==retailer[i]
 +
+    y <- log(cheese$VOLUME[filter])
 +
+    X <- cbind(1,      # intercept placeholder
 +
+            cheese$DISP[filter],
 +
+            log(cheese$PRICE[filter]))
 +
+    regdata[[i]] <- list(y=y, X=X)
 +
+ }
 +
>
 +
>
 +
> Data <- list(regdata=regdata)
 +
> Mcmc <- list(R=2000)
 +
>
 +
> library(rpud)    # load rpudplus
 +
Rpudplus 0.3.3
 +
http://www.r-tutor.com
 +
Copyright (C) 2010-2013 Chi Yau. All Rights Reserved.
 +
Rpudplus is free for academic use only. There is absolutely NO warranty.
 +
 
 +
Attaching package: 'rpud'
 +
 
 +
The following object(s) are masked from 'package:bayesm':
 +
 
 +
    rhierLinearModel, rmultireg
 +
 
 +
>
 +
> system.time(
 +
+ outgpu <- rpud::rhierLinearModel(
 +
+ Data=Data,
 +
+ Mcmc=Mcmc,
 +
+ output="bayesm"))
 +
 +
Starting Gibbs Sampler for Linear Hierarchical Model
 +
    88  Regressions
 +
    1  Variables in Z (if 1, then only intercept)
 +
 +
Prior Parms:
 +
=========== 
 +
Deltabar
 +
    [,1] [,2] [,3]
 +
[1,]    0    0    0
 +
 +
A
 +
    [,1]
 +
[1,] 0.01
 +
 +
nu.e (d.f. for regression error variances) = 3
 +
 +
Vbeta ~ IW(nu,V)
 +
 +
nu = 6
 +
 +
V
 +
    [,1] [,2] [,3]
 +
[1,]    6    0    0
 +
[2,]    0    6    0
 +
[3,]    0    0    6
 +
 +
MCMC parms:
 +
========== 
 +
R = 2000, keep = 1
 +
 +
....................
 +
  user  system elapsed
 +
  0.192  0.156  0.350
 +
>
 +
> beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
 +
> beta.3
 +
[1] -2.145573
 +
>
 +
</source>

Latest revision as of 07:56, 16 September 2013

Contents

Hierarchical Linear Regressions using R and CUDA

This Hierarchical Linear Model example walks through http://www.r-tutor.com/gpu-computing/rbayes/rhierlmc as implemented on FarmShare.

In particular, this example is based on an example from Rossi, Allenby and McCulloch (2005) . It is based upon a data set called ’cheese’ from the baysem package. The data set contains marketing data of certain brand name processed cheese, such as the weekly sales volume (VOLUME), unit retail price (PRICE), and display activity level (DISP) in various regional retailer accounts. For each account, we can define the following linear regression model of the log sales volume, where β1 is the intercept term, β2 is the display measure coefficient, and β3 is the log price coefficient.

The R libraries used here can be downloaded from: http://www.r-tutor.com/content/download


installing rpud

Rpud comes in two parts. The first is a standard R library which can be installed the usual way, except we pass NVCC_OPT to instruct the code generator to produce code for the 2.0 level devices we are using. The second part is distributed as a binary blob which has an install.sh.

Here we get and install the first part into your home directory (under lib/R).

module load cuda
mkdir /tmp/$USER
cd /tmp/$USER
wget http://www.r-tutor.com/sites/default/files/rpud/rpud_0.3.3.tar.gz
mkdir ~/lib/R
export R_LIBS_USER="${HOME}/lib/R"
NVCC_OPT="-arch compute_20" R CMD INSTALL --library=$R_LIBS rpud_0.3.3.tar.gz

Here we get and install the second part:

module load cuda
cd /tmp/$USER
wget http://www.r-tutor.com/sites/default/files/rpud/rpudplus_0.3.3.tar.gz
export R_LIBS_USER="${HOME}/lib/R"
cd rpudplus_0.3.3
./install.sh


running on CPU

As a comparison, we will run on CPU using standard bayesm R library.

save following source code as rpudtestcpu.r

library(bayesm)

data("cheese")
str("cheese")

retailer <- levels(cheese$RETAILER)
nreg <- length(retailer); nreg

regdata <- NULL

for (i in 1:nreg) {
    filter <- cheese$RETAILER==retailer[i]
    y <- log(cheese$VOLUME[filter])
    X <- cbind(1,      # intercept placeholder
            cheese$DISP[filter],
            log(cheese$PRICE[filter]))
    regdata[[i]] <- list(y=y, X=X)
}

Data <- list(regdata=regdata)
Mcmc <- list(R=2000)

system.time(
outcpu <- bayesm::rhierLinearModel(
         Data=Data,
         Mcmc=Mcmc))

beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
beta.3cpu

Run it. In this case it took 42 seconds.

cp -p /farmshare/software/examples/R/hierregress/cheese.rda .
R --no-save < rpudtestcpu.r

you should see output similar to:

bishopj@rye01:~$ R --no-save < rpudtestcpu.r 

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(bayesm)
> 
> data("cheese")
> str("cheese")
 chr "cheese"
> 
> retailer <- levels(cheese$RETAILER)
> nreg <- length(retailer); nreg
[1] 88
> 
> regdata <- NULL
> 
> for (i in 1:nreg) {
+     filter <- cheese$RETAILER==retailer[i]
+     y <- log(cheese$VOLUME[filter])
+     X <- cbind(1,      # intercept placeholder
+             cheese$DISP[filter],
+             log(cheese$PRICE[filter]))
+     regdata[[i]] <- list(y=y, X=X)
+ }
> 
> Data <- list(regdata=regdata)
> Mcmc <- list(R=2000)
> 
> system.time(
+ outcpu <- bayesm::rhierLinearModel(
+          Data=Data,
+          Mcmc=Mcmc))
Z not specified -- putting in iota
 
Starting Gibbs Sampler for Linear Hierarchical Model
    88  Regressions
    1  Variables in Z (if 1, then only intercept)
 
Prior Parms: 
Deltabar
     [,1] [,2] [,3]
[1,]    0    0    0
A
     [,1]
[1,] 0.01
nu.e (d.f. parm for regression error variances)=  3
Vbeta ~ IW(nu,V)
nu =  6
V 
     [,1] [,2] [,3]
[1,]    6    0    0
[2,]    0    6    0
[3,]    0    0    6
 
MCMC parms: 
R=  2000  keep=  1
 
MCMC Iteration (est time to end - min) 
  100  ( 0.7 )
  200  ( 0.7 )
  300  ( 0.6 )
  400  ( 0.6 )
  500  ( 0.5 )
  600  ( 0.5 )
  700  ( 0.5 )
  800  ( 0.4 )
  900  ( 0.4 )
  1000  ( 0.4 )
  1100  ( 0.3 )
  1200  ( 0.3 )
  1300  ( 0.2 )
  1400  ( 0.2 )
  1500  ( 0.2 )
  1600  ( 0.1 )
  1700  ( 0.1 )
  1800  ( 0.1 )
  1900  ( 0 )
  2000  ( 0 )
  Total Time Elapsed:  0.71 
   user  system elapsed 
 42.288   0.036  42.448 
> 
> beta.3cpu <- mean(as.vector(outcpu$betadraw[, 3, 201:2000]))
> beta.3cpu
[1] -2.146799
>

running on GPU

save following source code as rpudtestgpu.r

library(bayesm)

data("cheese")
str("cheese")

retailer <- levels(cheese$RETAILER)
nreg <- length(retailer); nreg

regdata <- NULL

for (i in 1:nreg) {
    filter <- cheese$RETAILER==retailer[i]
    y <- log(cheese$VOLUME[filter])
    X <- cbind(1,      # intercept placeholder
            cheese$DISP[filter],
            log(cheese$PRICE[filter]))
    regdata[[i]] <- list(y=y, X=X)
}


Data <- list(regdata=regdata)
Mcmc <- list(R=2000)

library(rpud)    # load rpudplus

system.time(
outgpu <- rpud::rhierLinearModel(
	Data=Data,
	Mcmc=Mcmc,
	output="bayesm"))

beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
beta.3

Run it. For GPU it takes 0.35 seconds, down from 42.448 on the CPU alone.

bishopj@rye01:~$ module load cuda
bishopj@rye01:~$ export R_LIBS_USER=$HOME/lib/R
bishopj@rye01:~$ R --no-save < rpudtestgpu.r 

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(bayesm)
> 
> data("cheese")
> str("cheese")
 chr "cheese"
> 
> retailer <- levels(cheese$RETAILER)
> nreg <- length(retailer); nreg
[1] 88
> 
> regdata <- NULL
> 
> for (i in 1:nreg) {
+     filter <- cheese$RETAILER==retailer[i]
+     y <- log(cheese$VOLUME[filter])
+     X <- cbind(1,      # intercept placeholder
+             cheese$DISP[filter],
+             log(cheese$PRICE[filter]))
+     regdata[[i]] <- list(y=y, X=X)
+ }
> 
> 
> Data <- list(regdata=regdata)
> Mcmc <- list(R=2000)
> 
> library(rpud)    # load rpudplus
Rpudplus 0.3.3
http://www.r-tutor.com
Copyright (C) 2010-2013 Chi Yau. All Rights Reserved.
Rpudplus is free for academic use only. There is absolutely NO warranty.

Attaching package: 'rpud'

The following object(s) are masked from 'package:bayesm':

    rhierLinearModel, rmultireg

> 
> system.time(
+ outgpu <- rpud::rhierLinearModel(
+ 	Data=Data,
+ 	Mcmc=Mcmc,
+ 	output="bayesm"))
 
Starting Gibbs Sampler for Linear Hierarchical Model
    88  Regressions
    1  Variables in Z (if 1, then only intercept)
 
Prior Parms: 
===========  
Deltabar
     [,1] [,2] [,3]
[1,]    0    0    0
 
A
     [,1]
[1,] 0.01
 
nu.e (d.f. for regression error variances) = 3
 
Vbeta ~ IW(nu,V)
 
nu = 6
 
V 
     [,1] [,2] [,3]
[1,]    6    0    0
[2,]    0    6    0
[3,]    0    0    6
 
MCMC parms: 
==========  
R = 2000, keep = 1
 
....................
   user  system elapsed 
  0.192   0.156   0.350 
> 
> beta.3 <- mean(as.vector(outgpu$betadraw[, 3, 201:2000]))
> beta.3 
[1] -2.145573
>
Personal tools
Toolbox
LANGUAGES