Education 401C  Autumn 2013
    Data analysis examples using R


David Rogosa Sequoia 224,   rag{AT}stat{DOT}stanford{DOT}edu   Office hours: Thursday 2:10-3
Course web page: http://www.stanford.edu/~rag/ed401/


 Data analysis examples using R.      Ed401 Aut 2013 (1 unit)
Description
 We will do basic and intermediate level statistical analysis examples  
 (of the sort that students will have seen in their courses) in R.
see http://www.stanford.edu/~rag/ed401


Course Schedule
Five (2hr) mtgs Th 3:15-5  Room 160-321
Oct 3    1. Descriptive stats; analysis of means (up through anova) 
Oct 10   2. Correlation and regression (up through multiple regression, variable selection etc) 
Oct 17   3. Categorical vars (tables, logistic regression) 
Oct 24   4. Multilevel data, mixed-effects models (e.g. Bryk HSB data) 
Nov 7    5. Student analyses (students present a small analysis of their own)
 
  
  
Getting started (download and install R)
1/7/09.  NY Times endorses R: Data Analysts Captivated by R's Power
Current version of R is version 3.0.1 (Good Sport) released on 2013-05-16. (update: 3.0.2 Frisbee Sailing has been released on 2013-09-25.)
    For references and software: The R Project for Statistical Computing   Closest download mirror is Berkeley
Many students employ RStudio to enhance their R-enjoyment. I won't use it, but it serves very well especially on a single screen (e.g. portable) machine. "RStudio IDE is a powerful and productive user interface for R. It's free and open source, and works great on Windows, Mac, and Linux."     A short R-intro that includes RStudio

Resources
The greatest challenge here is not being overwhelmed by all the options.
0. Reference Cards and other short documents section of CRAN page
1. When I taught the introductory course Stat141, the text for computing was Using R for introductory Statistics, J. Verzani, Chapman & Hall, 2005. An online version available from John Verzani's page .   alternate version
2. In Stat209 Primary resource for R and data analysis is   Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010    short draft version in CRAN      Text resource page
3. A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters   Stanford access      Chaps 2-7 relevant to our materials. Data sets etc Package 'HSAUR2'
4. From CRAN central: An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Version 3.0.1 (2013-05-16) W. N. Venables, D. M. Smith and the R Core Team


WEEK 1 (10/3) Descriptive Statistics, Group Comparisons (including anova)
a. Harvey Goldstein's Exam data, single school excerpt. Schoo114 data (ascii)  data analysis session    Sch14 graphics    data documentation
b. Andrew Gelman's Sesame Street data (read in stata file using foreign package)     data import and analysis session   write out augmented data set (10/10)
    manual for package foreign    advanced data import guide: R Data Import/Export
c. One-way anova with Tukey multiple comparisons, Harrington data    data analysis session (oneway anova and mult comp)   
d. Two-way anova (stat141 ex) Soybeans. Soybean data (ascii)   Textbook description   Stat141 analysis

WEEK 2 (10/10) Correlation and Regression
a. Bivariate data
    i. correlation and scatterplots   platelet session      platelet plots       extra stat141 example Brain and Body Weights for 62 Species of Land Mammals
    ii. Straight-line regression     single subject Sleepstudy example    R session     plots and handout version
b. Multiple regression and interpretation of coefficients. MT woes of regression coefficients slides   R-session. Coleman data: adjusted-variables multiple regression   data file, 20 schools     Adjusted variable plot
10/15 additions.    using pairs command      generating a larger version of Coleman data and more plots
c. Instrumental variables regression.    Estimating the Return to Education for Married Women (Woolridge text Ex 15.1).
     Mroz87 data      Mroz87 data description      IV data analysis session    Woolridge stata ivreg
10/15 Background exposition for IV and returns to schooling:  Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Joshua D. Angrist; Alan B. Krueger, The Journal of Economic Perspectives Vol. 15, No. 4 (Autumn, 2001), pp. 69-85
d. Missing data and multiple imputation methods.
nhanes data in package mice     R-session using mice package
  Background materials, Multiple Imputation in R. van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. see also multiple imputation online    Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Book contents online He is the originator of mice .   R resources.  Multivariate Analysis Task View, Missing data section, esp packages mice

WEEK 3 (10/17) Categorical Data, Generalized Linear Models    Holloween edition
a. Proportions (1x2) and (1xK) tables.    proportions R session    For Titanic data, music Pete Seeger - The Titanic
b. 2x2 and rxc tables; independence and odds ratios.   R session, Titanic and nightmares
c. 2x2x2 tables; Simpsons paradox.   Death penalty example (Agresti)   
d. Dichotomous outcomes, logistic regression (glm logit link)       Donner party data         Donner analysis handout
e. Counts; more generalized linear models (log link)       Aids in Belgium R-session        Source:   AIDS in Belgium example, (from Simon Wood) single trajectory, count data using glm.


WEEK 4 (10/24) Multilevel data, Mixed effects models
a. Robinson (1950) ecological correlations   2x2 table ex
b. High School and Beyond data. Descriptive analyses and multilevel modeling via lme4 package
      i.  Background:       Collection of HSB data analyses from various text sources      A nice teaching document from Indiana that does HSB data with every known statistical package (including lmer)
      ii.  Our analyses in R.         complete Bryk dataset     first pass, Bryk data:   session    plots 
            Stat209 Lab 2; HSB analysis using lme4, lmer  (includes creation of single data set)  Lecture slide, lme lmer for Bryk data   side-by-side boxplots, SFYS analysis      HSB: analysis of covariance on group means        school means dataset, HSB ancova
            Background:    John Fox lme tutorial     Stat209 base Lab2 (nlme legacy version) with extended data management and lmList materials



WEEK 5 (11/7) Student data analysis presentations