Introduction to Data Analysis for Sociology Graduate Students
rev: 10/29/2021
Syllabus
Fall Quarter, 2021
Mondays and Wednesdays
11A-1P
Bldg 200, room 201
Lab/Section once a week for homework and once a week for projects
Michael J. Rosenfeld
Professor
Department of Sociology
Building 120 room 124
The class website is my personal Stanford website
Office Hours TBA
TAs:
Terresa Eun
Jan Voelkel
Introduction:
In this class you will teach yourself basic statistics including regression, how do statistical analysis, and how to find flaws and problems with statistical analyses.
In the process of learning about data analysis you will also learn about demography and stratification in the U.S., because the dataset is the Current Population Survey of March, 2000, which is a nationally representative survey of more than 60,000 households, with lots of information about race, gender, income, occupation, place of residence, and so on. You'll also learn how to use one of the most powerful and flexible tools for data analysis, the statistical software STATA.
Most class materials will be posted on my website (www.stanford.edu/~mrosenfe). We will use Canvas for collecting homework and returning homework, collecting and returning presentation drafts, collecting presentation slides, posting grades, and sending group emails.
The situation we are in:
We are still in the middle of a deadly global pandemic. Health and safety is our first priority. Class will meet in person but in order to make this work, everyone will need to wear a mask covering nose and mouth for the entirety of the class. You will also be required to comply with Stanford's testing protocol, which you can demonstrate with the green check mark on your health check app. Students will be asked to show their Stanford Health Check green badge before every class, so keep your testing regime up. See university guidelines here and here.
Readings and Grading Policy
Books required (available at Stanford Bookstore):
* Tufte, Edward. 2001. The Visual Display of Quantitative Information. Graphics Press. ISBN-10: 0961392142. $30
* Treiman, Donald. 2009. Quantitative Analysis: Doing Social Research to Test Ideas. Jossey-Bass. ISBN-10: 0470380039. $59
* Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions Fail- But Some Don’t. ISBN-10: 0143125087. $16
Recommended Books:
* Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression Based Approach, by Andrew F. Hayes. Second Edition. 2018. Guilford Press. ISBN-10: 9781462534654. $48
* Mathematical Statistics and Data Analysis, by John Rice, Duxbury Press, 3rd edition 2006, ISBN-10: 0534399428. $175
* Freedman, David, Robert Pisani, and Roger Purves. 2007. Statistics. Fourth Edition. W.W. Norton. $125. ISBN-10: 0393929728
The most important readings for the class are the Excel files, Stata logs, and PDF documentation posted on my website. Aside from the Tufte book, which we will be going over page-by-page in class, the other books are all supplementary. You don’t need the books. This is briefly why you should own the books anyway:
* Treiman is an excellent book about social statistics (using Stata), which covers some practical aspects of data analysis that we won’t get to in this class. Treiman’s book was written for Sociology PhD students.
* Freedman is a classic introductory text about statistics, with no math, but with very good plain English explanations. If you don’t have a math background, Freedman’s explanations may be helpful to you. If you do have a math background, the Freedman may help you explain statistics to other people. And if you end up teaching undergraduate statistics in the future, you may be teaching from Freedman.
* Rice is a classic introduction to statistics for readers who have at least a modest familiarity with calculus. Rice offers outlines of proofs, a fairly deep discussion of probability theory, and lots of great problems you can work through on your own. Rice is a great reference book that you should have on your shelf if you plan on doing any data analysis.
* Silver is a brilliant book about some practical applications and mis-applications of statistical thinking in the everyday world.
* Hayes is a really useful book about mediation and moderation analyses, with very thoughtful plain English explanations. The programs Hayes has written are built on SAS, which is not the software we will be using. Hayes is a generally useful resource but we will recreate the methods with tools in Stata.
Software Required
* You will need Stata in order to do the homework for Soc 381. You have several options:
1) You can use Stata over Stanford’s Farmshare unix network. This is free but a little cumbersome. See notes on the class webpage.
2) Purchase a Stata license and run it on your own computer
https://www.stata.com/order/new/edu/profplus/student-pricing/
Stata/BE ($225 perpetual license) is sufficient for this class. Stata/SE will allow you to load larger datasets like the entire GSS ($425 perpetual license).
3) There may be an option to run Stata on a student computer cluster in building 120. I am checking in to that.
4) If you are an R expert (and by expert I mean you have used R for all kinds of data analysis before, and you will not need any help translating the class assignments from Stata into R), then you can request permission to do the homeworks all in R. But note: class will be entirely in Stata.
Students with Disabilities:
Students with Documented Disabilities: Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://studentaffairs.stanford.edu/oae).
Units:
This Course justifies an additional unit of credit, beyond what would be expected based on the typical assignment of class time and outside work. An additional unit represents, on average, 30 additional hours of work expected of a student during the quarter, devoted to homework and to the preparation of the student’s research presentation.
Computer Use Policy:
* Computer use by students during class is strictly limited to following along with the data analysis examples being presented by the professor.
Grading:
4 homeworks |
50% |
Regular section participation |
5% |
Your data project outline |
5% |
In-class presentation (data analysis of dataset of your own choosing) |
10% |
Final Exam |
30% |
Project and Reading Assignment Timeline
(Note: my chapter and section headings for Rice are from the 2nd edition; the same material should be in the 3rd edition but you may have to look for it).
Week |
CLASS |
Class lecture Goals |
READING (Readings in bold are required and will be discussed specifically in that class. Other readings are supplementary) |
ASSIGNMENT |
1 |
Sept 20 |
Introduction to Stata and Data Analysis Section
Basics of descriptive data analysis using STATA |
Read Treiman’s chapters 1-4. Read Rosenfeld’s online Stata guide
|
Hand out CPS HW #1 |
|
Sept 22 |
Observational Studies and their limitations |
Freedman Ch 2, 4 |
|
|
section |
Work on HW 1 and on using STATA |
|
|
|
|
|
|
|
2 |
Sept 27 |
Error and bias |
Freedman Ch 6 Silver Ch1, 4 |
|
|
Sept 29 |
Probability sampling, Sample size and power, and standard errors |
Freedman Ch 20; read also Treiman Ch 9; Rice, ch. 6 on “Distributions derived from the Normal Distribution” |
HW #1 due Hand out HW#2 |
|
section |
Stata, and HW 2 |
|
|
|
|
|
|
|
3 |
Oct 4 |
More on sample size and power. |
Freedman Ch 21 Rice, section 11.3 on “Comparing Paired Samples” |
|
|
Oct 6 |
Introduction to regression with STATA |
Freedman Chs 9, 10 Treiman, Ch 5-6 Hayes, Ch 2 |
|
|
section |
Work on STATA, discuss the issues in HWs 2 and 3 |
|
HW #2 Due Oct 8 Hand out HW#3 |
|
|
|
|
|
4 |
Oct 11 |
More on regression with STATA, interpreting coefficients |
Freedman, Ch 11; Rice ch. 14, “Linear Least Squares” |
|
|
Oct 13 |
Problems with and difficulties in using regression, Graphing. |
Freedman Ch 12 |
|
|
section |
Work on STATA, discuss the issues in CPS HW #3 |
|
|
|
|
|
|
|
5 |
Oct 18 |
Logistic regression |
Treiman chapter 13 Rice section 8.5 “The Method of Maximum Likelikhood” |
|
|
Oct 20 |
logistic regression and the likelihood ratio test |
Treiman p. 264-276; Rice section 9.3 the “Neyman-Pearson Lemma”, 9.4 on “Confidence Intervals and Hypothesis Tests” and section 9.5 on “Generalized Likelihood Ratio Tests” |
|
|
section |
Work on STATA |
|
|
|
|
|
|
|
6 |
Oct 25 |
Mediation analysis part 1 |
Hayes, Chapter 3 |
HW #3 due Hand out HW #4 |
|
Oct 27 |
Mediation analysis part 2 |
|
|
|
section |
work on HW 4 |
|
|
|
|
|
|
|
7 |
Nov 1 |
More on mediation |
||
|
Nov 3 |
Outliers |
The Jasso v. Udry debate is required reading: 1)Jasso's original article on coital frequency. 2) Kahn and Udry's critique. 3) Jasso's response See also: Silver, Ch 2 and 6 |
|
|
Section |
Work on HW 4 and projects |
|
|
|
|
|
|
|
8 |
Nov 8 |
Presentation of Data |
Tufte, read the entire book (required) |
|
|
Nov 10 |
Some additional, and advanced topics |
|
|
|
Section |
Work on Projects |
|
HW #4 due Nov 12 |
|
|
|
|
|
|
Nov 15 |
Some additional advanced topics |
|
|
|
Nov 17 |
More advanced topics |
|
|
|
|
|
|
Presentation Proposals Due Nov 19 |
9 |
Nov 22 |
No class; Thanksgiving Break |
|
|
|
Nov 24 |
No class; Thanksgiving Break |
|
|
|
Section |
Break |
|
|
|
|
|
|
|
10 |
Nov 29 |
Student Presentations |
|
|
|
Dec 1 |
Student Presentations and Final Exam Review |
|
|
|
Section |
Exam review |
|
|
|
|
|
|
|
Final Exam |
|
in class Final Exam at the regularly scheduled time and place: TBA |
|
|