Education 401C Autumn 2014
Data analysis examples using R
David Rogosa
Sequoia 224, rag{AT}stat{DOT}stanford{DOT}edu
Course web page: http://web.stanford.edu/~rag/ed401/
For 2013 course materials go here
Data analysis examples using R. Ed401 Aut 2014 (1 unit)
Description
We will do basic and intermediate level statistical analysis examples
(of the sort that students will have seen in their courses) in R.
Examples include: descriptive statistics and plots, group comparisons, correlation and regression, categorical variables, multilevel data.
See http://web.stanford.edu/~rag/ed401/
Course Schedule
Five (2hr) mtgs Th 3:15-5
Oct 2 1. Descriptive stats; analysis of means (up through anova)
Oct 9 2. Correlation and regression (up through multiple regression, variable selection etc)
Oct 16 3. Categorical vars (tables, logistic regression)
Oct 23 4. Multilevel data; descriptives, plots, and intro to mixed-effects models (e.g. Bryk HSB data)
Nov 6 5. Student analyses (students present a small analysis of their own)
Class location Our assigned room is art4 in the lower level of Cummings Art Bldg. For reasons I cannot explain, School of Education staff inserted multiple contradictory locations on the Axess page, which after months of effort have been expunged. This page, which is out of the reach of School of Education staff, is your best source of logistics info. My apologies for any confusions.
Getting started (download and install R)
1/7/09. NY Times endorses R: Data Analysts Captivated by R's Power
Current version of R is version 3.1.1 (Sock it to Me) released on 2014-07-10
For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley
Many students employ RStudio to enhance their R-enjoyment. I won't use it, but it serves very well especially on a single screen (e.g. portable) machine. "RStudio IDE is a powerful and productive user interface for R.
It's free and open source, and works great on Windows, Mac, and Linux." A short R-intro that includes RStudio
Resources
The greatest challenge here is not being overwhelmed by all the options.
0. Reference Cards and other short documents section of CRAN page
1. When I taught the introductory course Stat141, the text for computing was Using R for introductory Statistics, J. Verzani, Chapman & Hall, 2005.
An online version available from John Verzani's page . alternate version, single pdf UsingR R-package
2. In Stat209 a primary resource for R and data analysis is Data analysis and graphics using R (2007) J. Maindonald and J. Braun,
Cambridge 2nd edition 2007. 3rd edition 2010 short draft version in CRAN Text resource page
3. A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters Stanford access   Chaps 2-7 relevant to our materials. Data sets etc Package 'HSAUR2'
4. From CRAN central: An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Version 3.0.1 (2013-05-16) W. N. Venables, D. M. Smith and the R Core Team
WEEK 1 (10/2) Descriptive Statistics, Group Comparisons (including anova)
Core Examples
a. Harvey Goldstein's Exam data, single school excerpt. Schoo114 data (ascii) data analysis session Sch14 graphics data documentation extra: UsingR function ex
b. Andrew Gelman's Sesame Street data (read in stata file using foreign package) data import and analysis session write out augmented and subsetted data sets
  manual for package foreign advanced data import guide: R Data Import/Export
c. One-way anova with Tukey multiple comparisons, Harrington data data analysis session (oneway anova and mult comp)
Extra Items
d. Factorial Designs; Two-way anova (stat141 ex) Soybeans. Soybean data (ascii) Textbook description Stat141 analysis SW text exs in RforBiologists , soybean p,30
e. Functions and Loops in R. Verzani pdf text p.6, std function; Verzani pdf text p.47, Central Limit Theorem simulation. Stat141 handout Verzani text in a single pdf see also Chap. 9,10 of An Introduction to R (#4 above).
WEEK 2 (10/09) Correlation and Regression
Core Examples
a. Bivariate data
i. correlation and scatterplots platelet session platelet plots platelet data extra stat141 example Brain and Body Weights for 62 Species of Land Mammals
ii. Straight-line regression single subject Sleepstudy example R session plots and handout version
b. Multiple regression and interpretation of coefficients. MT woes of regression coefficients slides R-session. Coleman data: adjusted-variables multiple regression data file, 20 schools Adjusted variable plot
more Coleman. using pairs command generating a larger version of Coleman data and more plots
Extra Items
c. Instrumental variables regression. Estimating the Return to Education for Married Women (Woolridge text Ex 15.1).
Mroz87 data Mroz87 data description IV data analysis session Woolridge stata ivreg
10/15 Background exposition for IV and returns to schooling: Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Joshua D. Angrist; Alan B. Krueger, The Journal of Economic Perspectives Vol. 15, No. 4 (Autumn, 2001), pp. 69-85
d. Missing data and multiple imputation methods.
i. single variable. Stef Van Buren example
ii. traditional bivariate multivariate data methods, correlation and regression example
iii. Multiple Imputation.
nhanes data in package mice R-session using mice package
Background materials, Multiple Imputation in R. van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. see also multiple imputation online Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Book contents online book extras He is the originator of mice . R resources. Multivariate Analysis Task View, Missing data section, esp packages mice
Addendum on Publishing. In reponse to a student question, I mentioned sweave and alternatives. Students use these, even for problem sets, but these do have a learning curve, especially if you are unaware of TeX etc.
So I gathered together some quick resources, esp for use within R-studio where use of sweave is facililitated.
RStudio help. Using Sweave and knitr also Using Sweave and LaTeX with R 3.0.2 Rstudio support queries: 1 2
Some additional intro docs. San Diego State UW Montana Wharton,UPenn Germany Minnesota
Also the latex command from the Hmisc package
WEEK 3 (10/16) Categorical Data, Generalized Linear Models Holloween edition
Note: It would be good to take a few minutes at the beginning of class to take stock of our progress and its efficacy, now that we've had two of our four presentation sessions.
Core Examples
a. Proportions (1x2) and (1xK) tables. proportions R session For Titanic data, music Pete Seeger - The Titanic
b. 2x2 and rxc tables; independence and odds ratios. R session, Titanic and nightmares
c. 2x2x2 tables; Simpsons paradox. Death penalty example (Agresti)
Extra Items
d. Dichotomous outcomes, logistic regression (glm logit link) Donner party data
Donner analysis handout Donner Rsession
e. Counts; more generalized linear models (log link) Aids in Belgium R-session Source: AIDS in Belgium example, (from Simon Wood) single trajectory, count data using glm.
Addendum on scripts. Introduction to the R Project for Statistical Computing for use at ITC Appendix B ; A (very) short introduction to R scripts section; Kickstarting R - Writing R scripts
WEEK 4 (10/23) Multilevel data
a. Aggregation and ecological correlations Robinson (1950) 2x2 table ex
b. High School and Beyond data. complete Bryk dataset Data construction from files in the MEMSS
c. First pass, Bryk data: session plots
d. Additional plots for Multilevel data. R session xyplots
e. Comparison of Public and Catholic Schools using lme4 Description with lmList side-by-side boxplots, SFYS analysis
f. Ancova on school means (school level) HSB: analysis of covariance on group means school means dataset, HSB ancova
Background: Lecture slide, lme lmer for Bryk data Collection of HSB data analyses from various text sources A nice teaching document from Indiana that does HSB data with every known statistical package (including lmer) John Fox lme tutorial Stat209 base Lab2 (nlme legacy version) with extended data management and lmList materials
Stat209 Lab 2; HSB analysis using lme4, lmer (includes creation of single data set)
WEEK 5 (11/6) Student data analysis presentations