Homework 2 [revised 10/21/2011]


Note: Unless otherwise noted (new Q4), do these functions without using weights.



1) Using the 2000 March CPS, compare the 1999 wage and salary earnings (variable incwage) of lawyers (occ1990==178), nurses (occ1990==95), and sociology teachers (occ1990==125).


            a) How many individuals are there in these 3 occupations in the 2000 CPS?

            b) What are the average income, the median income, the 25th percentile, and the 75th percentile of income for each occupation?

            c) How much overlap is there between the incomes of the 3 professions?

            d) Make a box graph of the income distribution of the 3 occupations, side by side. Copy the box plot into MS word or print it to a file so you can access it and copy it into your homework later.

            e) What is the standard deviation of the income for each group? What is the standard error of the mean for each group? Why is the standard error of the mean for sociology instructor’s income relatively large?

            f part 1) Use Excel or some other means to calculate the t-statistics for each comparison. Calculate the T-statistics with unequal variance and the T-statistics with equal variance (see my notes on mean and variance for the formulas). Compare these T- statistics to the appropriate Normal distribution and T-distribution tables in the back of the book (or use Stata to generate the normal and T distribution probabilities), and use these to answer the question in part f:

            f continued) Based on your hand calculations, did lawyers earn significantly more than nurses in 1999? How about lawyers and sociologists? Nurses and sociologists? Based on your calculations, what is the probability that the income gap between each of the 3 pairs is a result of random sampling variation?

            f continued [NEW for Soc 381 in 2012/ Soc 180B can ignore this part of Q1f]: Use Stata to generate the 2 tail probabilities, both Normal and T-distribution, associated with each of the 6 T-tests (3 different comparisons, both equal and unequal variance assumption). Explain what the probability is the probability of. Explain why the Normal and T distribution probabilities are different, and whether and why the differences are greater or less in the equal or the unequal variance T-tests.

            g) Use Stata to do 2-sample T-tests on each of the 3 pairs of income comparisons (use the “unequal” option to create the unequal variance t-statistic). What do the results show? Note that the ttest function does not allow weights.

            h) Use Stata to do a simple regression on incwage, with occ1990 as the only predictor (generate dummy variables for occupation), and including only our 3 occupations. If you include only 2 occupations at a time (one dummy variable, one comparison category) you will get an answer similar to one of the T-tests; which T-test is it? If you include more than two groups in your regression, the T-statistics will be different from the 2-sample T-tests. Comment on the regression results , i.e. mean differences, the standard errors and significances of the income differences between groups.


2)         a) Show that Average(a+bXi)= a+b(Average(Xi)) (easy)

            b) Show that Variance(a+bXi)=b2(Variance(Xi)). In this proof, do *not* assume the property Var(bX)=b2Var(X) (and equivalently, don’t assume you know that SD(bX)=bSD(X)).


For this question, you should use the rules about summary notation and the rules about averages that I show on the first page of my "notes about mean and variance and sample statistics", so the definition of summary notation and items 1 and 2. Also, you should use the definition of variance, which is item 4 in my "notes about mean and variance and sample statistics.”


3) Now create a new variable, incwage2

gen incwage2=incwage*2


            a) How are the average and standard deviation of incwage2 different from the average and standard deviation of incwage?

            b) Redo the Excel comparisons of income between the three professions, using incwage2 instead of incwage. How are the results different? How are they the same?

            c) Use Stata to generate 2 sample T-test comparisons of earnings between the 3 pairs of occupations. How do the T-statistics compare using incwage2 and incwage?

            d) Run the regressions with occupation predicting income, and answer the same question: does incwage2 give different results than incwage?


4) Redo the regressions of occupation on incwage, comparing sociologists’ incomes to nurses’ incomes using the weights, first with aweight and then with fweight. Now use the weights to calculate the same T-statistic by hand (“by hand” in this context means to generate the weighted N, mean, and SD in Stata, and then copy those values into a copy of Rosenfeld’s Excel sheet t-test calculator). How would you interpret the T-statistic produced by regress using the aweight? How would you interpret the T-statistic produced by regress using the fweight?