--------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\Soc180B_spr2019_logs\class12_
> log.log
log type: text
opened on: 9 May 2019, 14:22:18
. *(8 variables, 11 observations pasted into data editor)
* The first thing I did was I opened the anscombe dataset (which you can find on my website, right next to the hw4 assignment, https://web.stanford.edu/~mrosenfe/soc_meth_proj3/Anscombe%27s_data.xls ) in Excel, then I copied the data along with column headers into the “data editor” tab in Stata, and I indicated that the first row was variable labels, and then I clicked OK. The variables (x1, y1, x2, y2, etc) then showed up in Stata.
. *class starts here
. *(8 variables, 11 observations pasted into data editor)
* Run these scatter plots yourself to see what they look like. Tufte had all 4 of these plots on page 2 of his book.
. twoway (scatter y2 x2)
. twoway (scatter y1 x1)
. twoway (scatter y2 x2) (lfit y2 x2)
* The above syntax means that we are going to make an XY scatter plot of Y2 against X2, and the second parenthesis means we are going to superimpose the best fit line, the regression line, onto the same graph.
. regress y2 x2
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.97
Model | 27.5000024 1 27.5000024 Prob > F = 0.0022
Residual | 13.776294 9 1.53069933 R-squared = 0.6662
-------------+------------------------------ Adj R-squared = 0.6292
Total | 41.2762964 10 4.12762964 Root MSE = 1.2372
------------------------------------------------------------------------------
y2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | .5 .1179638 4.24 0.002 .2331475 .7668526
_cons | 3.000909 1.125303 2.67 0.026 .4552978 5.54652
------------------------------------------------------------------------------
. regress y1 x1
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
*The key point to note is that the regression line is the same for each numbered pair of X and Y, but the 4 scatter plots look very different.
. twoway (scatter y3 x3) (lfit y3 x3)
. twoway (scatter y4 x4) (lfit y4 x4)
. clear all
*Now on to the 50 state dataset, which is also right next to my HW4 assignment, on my class homepage.
. twoway (scatter incwage NH_White_proportion, mlabel(statefip)) (lfit incwage NH_White_proportion)
. summarize NH_White_proportion
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
NH_White_p~n | 51 .7626632 .1633623 .2354178 .9835737
. regress incwage NH_White_proportion
Source | SS df MS Number of obs = 51
-------------+------------------------------ F( 1, 49) = 2.14
Model | 18878316.5 1 18878316.5 Prob > F = 0.1500
Residual | 432407199 49 8824636.71 R-squared = 0.0418
-------------+------------------------------ Adj R-squared = 0.0223
Total | 451285515 50 9025710.3 Root MSE = 2970.6
-------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
NH_White_proportion | -3761.36 2571.649 -1.46 0.150 -8929.282 1406.562
_cons | 22161.23 2004.928 11.05 0.000 18132.18 26190.29
-------------------------------------------------------------------------------------
. predict M1_predicted
(option xb assumed; fitted values)
*One key thing to do after a regression is to generate a new variable with the predicted values of the regression, which we do above and we call it “M1_predicted.”
. twoway (scatter incwage NH_White_proportion, mlabel(statefip)) (lfit incwage NH_White_proportion) (connected M1_predicted NH_White_proportion)
*The above syntax takes the 50 state dataset, plots average state income (Y axis) on non white proportion (X axis), and attaches the statename as a label to each point. Then the lfit line is plotted on top, and the predicted values plotted on top of that, showing that the predicted values from the regression line are the same as the lfit line.
. gen residual=incwage-M1_predicted
* From the predicted values, we generate a variable for the residual, which is actual minus predicted.
. gen abs_residual=abs(residual)
* We care about which points are furthest from the best fit line, so we generate a new variable with the absolute values of the residuals
. gsort -abs_residual
* The above command sorts our 50 state dataset according to absolute value of their residual from the last regression, from largest to smallest.
. dfbeta( NH_White_proportion)
_dfbeta_1: dfbeta(NH_White_proportion)
* The dfbeta command is another post-regression command, that asks Stata to generate dfbetas for specific predictors, here we have only one predictor, NH_White_proportion. Don’w worry about the units of the dfbetas. The dfbetas tell us which of the 50 states have the most influence of the slope of the regression line. It turns out that while CT and NJ are furthest from the line, DC has the most influence because it is furthest from the other points. You will have to plot the points to see what I mean.
. gen abs_dfbeta=abs( _dfbeta_1)
. list statefip abs_residual residual abs_dfbeta _dfbeta_1
+--------------------------------------------------------------------+
| statefip abs_re~l residual abs_df~a _dfbeta_1 |
|--------------------------------------------------------------------|
1. | Connecticut 5617.314 5617.314 .0487289 .0487289 |
2. | New Jersey 5432.736 5432.736 .1172426 -.1172426 |
3. | New Mexico 5139.932 -5139.932 .5375599 .5375599 |
4. | Montana 5024.921 -5024.921 .2145686 -.2145686 |
5. | Mississippi 5009.293 -5009.293 .2366829 .2366829 |
|--------------------------------------------------------------------|
6. | Maryland 4840.275 4840.275 .1743481 -.1743481 |
7. | West Virginia 4798.199 -4798.199 .2921322 -.2921322 |
8. | Massachusetts 4666.598 4666.598 .098265 .098265 |
9. | Arkansas 4460.403 -4460.403 .0185422 -.0185422 |
10. | Colorado 4390.833 4390.833 .0056333 .0056333 |
|--------------------------------------------------------------------|
11. | North Dakota 4204.203 -4204.203 .192666 -.192666 |
12. | Minnesota 4187.81 4187.81 .1740437 .1740437 |
13. | Alaska 3780.312 3780.312 .0506594 -.0506594 |
14. | Louisiana 3555.554 -3555.554 .1484154 .1484154 |
15. | Alabama 3416.563 -3416.563 .0482474 .0482474 |
|--------------------------------------------------------------------|
16. | District of Columbia 3381.999 3381.999 .5903278 -.5903278 |
17. | Michigan 3208.876 3208.876 .0356351 .0356351 |
18. | New Hampshire 3021.199 3021.199 .1813799 .1813799 |
19. | South Dakota 2937.162 -2937.162 .1303558 -.1303558 |
20. | Virginia 2706.413 2706.413 .0332524 -.0332524 |
|--------------------------------------------------------------------|
21. | Illinois 2687.422 2687.422 .04929 -.04929 |
22. | Washington 2536.313 2536.313 .0665131 .0665131 |
23. | Oklahoma 2383.868 -2383.868 .0102179 -.0102179 |
24. | Idaho 2289.952 -2289.952 .070259 -.070259 |
25. | Kentucky 2257.391 -2257.391 .075205 -.075205 |
|--------------------------------------------------------------------|
26. | Wisconsin 2079.558 2079.558 .0656909 .0656909 |
27. | South Carolina 1986.75 -1986.75 .022419 .022419 |
28. | Delaware 1970.206 1970.206 .0329438 -.0329438 |
29. | Wyoming 1860.133 -1860.133 .0957266 -.0957266 |
30. | Florida 1833.188 -1833.188 .0603009 .0603009 |
|--------------------------------------------------------------------|
31. | Arizona 1755.73 -1755.73 .062599 .062599 |
32. | Hawaii 1728.106 -1728.106 .3419163 .3419163 |
33. | Rhode Island 1562.148 1562.148 .0499289 .0499289 |
34. | Missouri 1460.693 1460.693 .0429018 .0429018 |
35. | Nebraska 1231.4 -1231.4 .0429309 -.0429309 |
|--------------------------------------------------------------------|
36. | Ohio 1017.316 1017.316 .024228 .024228 |
37. | Kansas 1001.302 -1001.302 .021283 -.021283 |
38. | New York 998.9494 998.9494 .0335865 -.0335865 |
39. | Georgia 939.2884 -939.2884 .0433774 .0433774 |
40. | Texas 873.475 -873.475 .0654734 .0654734 |
|--------------------------------------------------------------------|
41. | Utah 544.7642 -544.7642 .0203697 -.0203697 |
42. | Maine 536.482 -536.482 .0362305 -.0362305 |
43. | Nevada 347.9611 347.9611 .0080656 -.0080656 |
44. | Vermont 328.0741 -328.0741 .0203959 -.0203959 |
45. | California 294.7873 294.7873 .0240138 -.0240138 |
|--------------------------------------------------------------------|
46. | Oregon 241.7805 241.7805 .0072873 .0072873 |
47. | Pennsylvania 230.294 -230.294 .0063625 -.0063625 |
48. | Indiana 165.3883 -165.3883 .0053708 -.0053708 |
49. | Tennessee 89.31062 89.31062 .0009869 .0009869 |
50. | North Carolina 46.82497 -46.82497 .0009033 .0009033 |
|--------------------------------------------------------------------|
51. | Iowa 17.83441 17.83441 .0008664 .0008664 |
+--------------------------------------------------------------------+
* above is the list of states sorted from largest to smallest absolute residual. Below is the list of states sorted from larges to smallest absolute value dfbeta.
. gsort - abs_dfbeta
. list statefip abs_residual residual abs_dfbeta _dfbeta_1
+--------------------------------------------------------------------+
| statefip abs_re~l residual abs_df~a _dfbeta_1 |
|--------------------------------------------------------------------|
1. | District of Columbia 3381.999 3381.999 .5903278 -.5903278 |
2. | New Mexico 5139.932 -5139.932 .5375599 .5375599 |
3. | Hawaii 1728.106 -1728.106 .3419163 .3419163 |
4. | West Virginia 4798.199 -4798.199 .2921322 -.2921322 |
5. | Mississippi 5009.293 -5009.293 .2366829 .2366829 |
|--------------------------------------------------------------------|
6. | Montana 5024.921 -5024.921 .2145686 -.2145686 |
7. | North Dakota 4204.203 -4204.203 .192666 -.192666 |
8. | New Hampshire 3021.199 3021.199 .1813799 .1813799 |
9. | Maryland 4840.275 4840.275 .1743481 -.1743481 |
10. | Minnesota 4187.81 4187.81 .1740437 .1740437 |
|--------------------------------------------------------------------|
11. | Louisiana 3555.554 -3555.554 .1484154 .1484154 |
12. | South Dakota 2937.162 -2937.162 .1303558 -.1303558 |
13. | New Jersey 5432.736 5432.736 .1172426 -.1172426 |
14. | Massachusetts 4666.598 4666.598 .098265 .098265 |
15. | Wyoming 1860.133 -1860.133 .0957266 -.0957266 |
|--------------------------------------------------------------------|
16. | Kentucky 2257.391 -2257.391 .075205 -.075205 |
17. | Idaho 2289.952 -2289.952 .070259 -.070259 |
18. | Washington 2536.313 2536.313 .0665131 .0665131 |
19. | Wisconsin 2079.558 2079.558 .0656909 .0656909 |
20. | Texas 873.475 -873.475 .0654734 .0654734 |
|--------------------------------------------------------------------|
21. | Arizona 1755.73 -1755.73 .062599 .062599 |
22. | Florida 1833.188 -1833.188 .0603009 .0603009 |
23. | Alaska 3780.312 3780.312 .0506594 -.0506594 |
24. | Rhode Island 1562.148 1562.148 .0499289 .0499289 |
25. | Illinois 2687.422 2687.422 .04929 -.04929 |
|--------------------------------------------------------------------|
26. | Connecticut 5617.314 5617.314 .0487289 .0487289 |
27. | Alabama 3416.563 -3416.563 .0482474 .0482474 |
28. | Georgia 939.2884 -939.2884 .0433774 .0433774 |
29. | Nebraska 1231.4 -1231.4 .0429309 -.0429309 |
30. | Missouri 1460.693 1460.693 .0429018 .0429018 |
|--------------------------------------------------------------------|
31. | Maine 536.482 -536.482 .0362305 -.0362305 |
32. | Michigan 3208.876 3208.876 .0356351 .0356351 |
33. | New York 998.9494 998.9494 .0335865 -.0335865 |
34. | Virginia 2706.413 2706.413 .0332524 -.0332524 |
35. | Delaware 1970.206 1970.206 .0329438 -.0329438 |
|--------------------------------------------------------------------|
36. | Ohio 1017.316 1017.316 .024228 .024228 |
37. | California 294.7873 294.7873 .0240138 -.0240138 |
38. | South Carolina 1986.75 -1986.75 .022419 .022419 |
39. | Kansas 1001.302 -1001.302 .021283 -.021283 |
40. | Vermont 328.0741 -328.0741 .0203959 -.0203959 |
|--------------------------------------------------------------------|
41. | Utah 544.7642 -544.7642 .0203697 -.0203697 |
42. | Arkansas 4460.403 -4460.403 .0185422 -.0185422 |
43. | Oklahoma 2383.868 -2383.868 .0102179 -.0102179 |
44. | Nevada 347.9611 347.9611 .0080656 -.0080656 |
45. | Oregon 241.7805 241.7805 .0072873 .0072873 |
|--------------------------------------------------------------------|
46. | Pennsylvania 230.294 -230.294 .0063625 -.0063625 |
47. | Colorado 4390.833 4390.833 .0056333 .0056333 |
48. | Indiana 165.3883 -165.3883 .0053708 -.0053708 |
49. | Tennessee 89.31062 89.31062 .0009869 .0009869 |
50. | North Carolina 46.82497 -46.82497 .0009033 .0009033 |
|--------------------------------------------------------------------|
51. | Iowa 17.83441 17.83441 .0008664 .0008664 |
+--------------------------------------------------------------------+
. log close
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\Soc180B_spr2019
> _logs\class12_log.log
log type: text
closed on: 9 May 2019, 17:19:01
------------------------------------------------------------------------------------