---------------------------------------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class14.log

  log type:  text

 opened on:   3 Nov 2021, 09:54:35

 

 

. *class starts here.

 

. clear all

 

* First we opened the Excel file for the Anscombe dataset, and we copied the data into the Stata data editor (find the data editor among the icons in Stata under the menus, or go Window>data editor.

 

. *(8 variables, 11 observations pasted into data editor)

 

. twoway (scatter y2 x2) (lfit y2 x2)

 

. twoway (scatter y1 x1) (lfit y1 x1)

 

. twoway (scatter y3 x3) (lfit y3 x3)

 

*Things to note about the Anscombe data pairs (Yn and Xn): the scatter plots all look different but the best fit OLS line, which is what “lfit y3 x3” gives us, look the same.

 

 

. regress y1 x1

 

      Source |       SS           df       MS      Number of obs   =        11

-------------+----------------------------------   F(1, 9)         =     17.99

       Model |  27.5100011         1  27.5100011   Prob > F        =    0.0022

    Residual |  13.7626904         9  1.52918783   R-squared       =    0.6665

-------------+----------------------------------   Adj R-squared   =    0.6295

       Total |  41.2726916        10  4.12726916   Root MSE        =    1.2366

 

------------------------------------------------------------------------------

          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

-------------+----------------------------------------------------------------

          x1 |   .5000909   .1179055     4.24   0.002     .2333701    .7668117

       _cons |   3.000091   1.124747     2.67   0.026     .4557369    5.544445

------------------------------------------------------------------------------

 

. regress y2 x2

 

      Source |       SS           df       MS      Number of obs   =        11

-------------+----------------------------------   F(1, 9)         =     17.97

       Model |  27.5000024         1  27.5000024   Prob > F        =    0.0022

    Residual |   13.776294         9  1.53069933   R-squared       =    0.6662

-------------+----------------------------------   Adj R-squared   =    0.6292

       Total |  41.2762964        10  4.12762964   Root MSE        =    1.2372

 

------------------------------------------------------------------------------

          y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

-------------+----------------------------------------------------------------

          x2 |         .5   .1179638     4.24   0.002     .2331475    .7668526

       _cons |   3.000909   1.125303     2.67   0.026     .4552978     5.54652

------------------------------------------------------------------------------

 

* In fact, the lines are exactly the same.

 

 

. clear all

 

*Now on to our 50 state summary data:

 

. use "C:\Users\mexmi\Documents\current class files\intro soc methods\fifty_state_dataset.dta"

 

. twoway (scatter incwage  NH_White_proportion, mlabel(statefip)) (lfit incwage NH_White_proportion)

 

* This above produces a graph of states by average income and proportion Non-Hispanic White. The best fit line seems to show that the whiter the state, the lower the average income.

 

 

. summarize  NH_White_proportion

 

    Variable |        Obs        Mean    Std. dev.       Min        Max

-------------+---------------------------------------------------------

NH_White_p~n |         51    .7626632    .1633623   .2354178   .9835737

 

. regress incwage  NH_White_proportion

 

      Source |       SS           df       MS      Number of obs   =        51

-------------+----------------------------------   F(1, 49)        =      2.14

       Model |  18878316.5         1  18878316.5   Prob > F        =    0.1500

    Residual |   432407199        49  8824636.71   R-squared       =    0.0418

-------------+----------------------------------   Adj R-squared   =    0.0223

       Total |   451285515        50   9025710.3   Root MSE        =    2970.6

 

-------------------------------------------------------------------------------------

            incwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

--------------------+----------------------------------------------------------------

NH_White_proportion |   -3761.36   2571.649    -1.46   0.150    -8929.282    1406.562

              _cons |   22161.23   2004.928    11.05   0.000     18132.18    26190.29

-------------------------------------------------------------------------------------

 

. predict M1_predicted

(option xb assumed; fitted values)

 

. gen residual=incwage- M1_predicted

 

. predict residual_v2, residual

 

. *two ways to generate the residuals.

 

* We generate the residuals and then generate a new variable holding the absolute value of residuals, because we want to know which state is furthest from the line, we don’t care whether it is above or below.

 

. gen abs_residual=abs(residual)

 

. gsort -abs_residual

 

. *gsort puts our data, our 51 states, in declining order of absolute value of residual (the minus in front of the variable tells stata to sort from largest to smallest).

 

* Now we use the dfbeta command to generate defbetas for the X variable of interest- here there is only one X variable.

 

 dfbeta ( NH_White_proportion)

 

 

Generating DFBETA variable ...

 

    _dfbeta_1: DFBETA NH_White_proportion

 

. gen abs_dfbeta=abs(_dfbeta_1)

 

* We want to know which states are most influential on the line, i.e. have the largest dfbeta in absolute value, so we generate a new variable with the absolute value of the dfbeta.

 

* States listed from largest absolute value residual to smallest:

 

. list statefip abs_residual residual  abs_dfbeta  _dfbeta_1

 

     +--------------------------------------------------------------------+

     |             statefip   abs_re~l    residual   abs_df~a   _dfbeta_1 |

     |--------------------------------------------------------------------|

  1. |          Connecticut   5617.314    5617.314   .0487289    .0487289 |

  2. |           New Jersey   5432.736    5432.736   .1172426   -.1172426 |

  3. |           New Mexico   5139.932   -5139.932   .5375599    .5375599 |

  4. |              Montana   5024.921   -5024.921   .2145686   -.2145686 |

  5. |          Mississippi   5009.293   -5009.293   .2366829    .2366829 |

     |--------------------------------------------------------------------|

  6. |             Maryland   4840.275    4840.275   .1743481   -.1743481 |

  7. |        West Virginia   4798.199   -4798.199   .2921322   -.2921322 |

  8. |        Massachusetts   4666.598    4666.598    .098265     .098265 |

  9. |             Arkansas   4460.403   -4460.403   .0185422   -.0185422 |

 10. |             Colorado   4390.833    4390.833   .0056333    .0056333 |

     |--------------------------------------------------------------------|

 11. |         North Dakota   4204.203   -4204.203    .192666    -.192666 |

 12. |            Minnesota    4187.81     4187.81   .1740437    .1740437 |

 13. |               Alaska   3780.312    3780.312   .0506594   -.0506594 |

 14. |            Louisiana   3555.554   -3555.554   .1484154    .1484154 |

 15. |              Alabama   3416.563   -3416.563   .0482474    .0482474 |

     |--------------------------------------------------------------------|

 16. | District of Columbia   3381.999    3381.999   .5903278   -.5903278 |

 17. |             Michigan   3208.876    3208.876   .0356351    .0356351 |

 18. |        New Hampshire   3021.199    3021.199   .1813799    .1813799 |

 19. |         South Dakota   2937.162   -2937.162   .1303558   -.1303558 |

 20. |             Virginia   2706.413    2706.413   .0332524   -.0332524 |

     |--------------------------------------------------------------------|

 21. |             Illinois   2687.422    2687.422     .04929     -.04929 |

 22. |           Washington   2536.313    2536.313   .0665131    .0665131 |

 23. |             Oklahoma   2383.868   -2383.868   .0102179   -.0102179 |

 24. |                Idaho   2289.952   -2289.952    .070259    -.070259 |

 25. |             Kentucky   2257.391   -2257.391    .075205    -.075205 |

     |--------------------------------------------------------------------|

 26. |            Wisconsin   2079.558    2079.558   .0656909    .0656909 |

 27. |       South Carolina    1986.75    -1986.75    .022419     .022419 |

 28. |             Delaware   1970.206    1970.206   .0329438   -.0329438 |

 29. |              Wyoming   1860.133   -1860.133   .0957266   -.0957266 |

 30. |              Florida   1833.188   -1833.188   .0603009    .0603009 |

     |--------------------------------------------------------------------|

 31. |              Arizona    1755.73    -1755.73    .062599     .062599 |

 32. |               Hawaii   1728.106   -1728.106   .3419163    .3419163 |

 33. |         Rhode Island   1562.148    1562.148   .0499289    .0499289 |

 34. |             Missouri   1460.693    1460.693   .0429018    .0429018 |

 35. |             Nebraska     1231.4     -1231.4   .0429309   -.0429309 |

     |--------------------------------------------------------------------|

 36. |                 Ohio   1017.316    1017.316    .024228     .024228 |

 37. |               Kansas   1001.302   -1001.302    .021283    -.021283 |

 38. |             New York   998.9494    998.9494   .0335865   -.0335865 |

 39. |              Georgia   939.2884   -939.2884   .0433774    .0433774 |

 40. |                Texas    873.475    -873.475   .0654734    .0654734 |

     |--------------------------------------------------------------------|

 41. |                 Utah   544.7642   -544.7642   .0203697   -.0203697 |

 42. |                Maine    536.482    -536.482   .0362305   -.0362305 |

 43. |               Nevada   347.9611    347.9611   .0080656   -.0080656 |

 44. |              Vermont   328.0741   -328.0741   .0203959   -.0203959 |

 45. |           California   294.7873    294.7873   .0240138   -.0240138 |

     |--------------------------------------------------------------------|

 46. |               Oregon   241.7805    241.7805   .0072873    .0072873 |

 47. |         Pennsylvania    230.294    -230.294   .0063625   -.0063625 |

 48. |              Indiana   165.3883   -165.3883   .0053708   -.0053708 |

 49. |            Tennessee   89.31062    89.31062   .0009869    .0009869 |

 50. |       North Carolina   46.82497   -46.82497   .0009033    .0009033 |

     |--------------------------------------------------------------------|

 51. |                 Iowa   17.83441    17.83441   .0008664    .0008664 |

     +--------------------------------------------------------------------+

 

*CT and NJ are the largest residuals, but have small dfbetas.

 

* Now resort the data from largest to smallest absolute value dfbeta, and re-list:

 

. gsort - abs_dfbeta

 

. list statefip abs_dfbeta  _dfbeta_1 abs_residual residual

 

     +--------------------------------------------------------------------+

     |             statefip   abs_df~a   _dfbeta_1   abs_re~l    residual |

     |--------------------------------------------------------------------|

  1. | District of Columbia   .5903278   -.5903278   3381.999    3381.999 |

  2. |           New Mexico   .5375599    .5375599   5139.932   -5139.932 |

  3. |               Hawaii   .3419163    .3419163   1728.106   -1728.106 |

  4. |        West Virginia   .2921322   -.2921322   4798.199   -4798.199 |

  5. |          Mississippi   .2366829    .2366829   5009.293   -5009.293 |

     |--------------------------------------------------------------------|

  6. |              Montana   .2145686   -.2145686   5024.921   -5024.921 |

  7. |         North Dakota    .192666    -.192666   4204.203   -4204.203 |

  8. |        New Hampshire   .1813799    .1813799   3021.199    3021.199 |

  9. |             Maryland   .1743481   -.1743481   4840.275    4840.275 |

 10. |            Minnesota   .1740437    .1740437    4187.81     4187.81 |

     |--------------------------------------------------------------------|

 11. |            Louisiana   .1484154    .1484154   3555.554   -3555.554 |

 12. |         South Dakota   .1303558   -.1303558   2937.162   -2937.162 |

 13. |           New Jersey   .1172426   -.1172426   5432.736    5432.736 |

 14. |        Massachusetts    .098265     .098265   4666.598    4666.598 |

 15. |              Wyoming   .0957266   -.0957266   1860.133   -1860.133 |

     |--------------------------------------------------------------------|

 16. |             Kentucky    .075205    -.075205   2257.391   -2257.391 |

 17. |                Idaho    .070259    -.070259   2289.952   -2289.952 |

 18. |           Washington   .0665131    .0665131   2536.313    2536.313 |

 19. |            Wisconsin   .0656909    .0656909   2079.558    2079.558 |

 20. |                Texas   .0654734    .0654734    873.475    -873.475 |

     |--------------------------------------------------------------------|

 21. |              Arizona    .062599     .062599    1755.73    -1755.73 |

 22. |              Florida   .0603009    .0603009   1833.188   -1833.188 |

 23. |               Alaska   .0506594   -.0506594   3780.312    3780.312 |

 24. |         Rhode Island   .0499289    .0499289   1562.148    1562.148 |

 25. |             Illinois     .04929     -.04929   2687.422    2687.422 |

     |--------------------------------------------------------------------|

 26. |          Connecticut   .0487289    .0487289   5617.314    5617.314 |

 27. |              Alabama   .0482474    .0482474   3416.563   -3416.563 |

 28. |              Georgia   .0433774    .0433774   939.2884   -939.2884 |

 29. |             Nebraska   .0429309   -.0429309     1231.4     -1231.4 |

 30. |             Missouri   .0429018    .0429018   1460.693    1460.693 |

     |--------------------------------------------------------------------|

 31. |                Maine   .0362305   -.0362305    536.482    -536.482 |

 32. |             Michigan   .0356351    .0356351   3208.876    3208.876 |

 33. |             New York   .0335865   -.0335865   998.9494    998.9494 |

 34. |             Virginia   .0332524   -.0332524   2706.413    2706.413 |

 35. |             Delaware   .0329438   -.0329438   1970.206    1970.206 |

     |--------------------------------------------------------------------|

 36. |                 Ohio    .024228     .024228   1017.316    1017.316 |

 37. |           California   .0240138   -.0240138   294.7873    294.7873 |

 38. |       South Carolina    .022419     .022419    1986.75    -1986.75 |

 39. |               Kansas    .021283    -.021283   1001.302   -1001.302 |

 40. |              Vermont   .0203959   -.0203959   328.0741   -328.0741 |

     |--------------------------------------------------------------------|

 41. |                 Utah   .0203697   -.0203697   544.7642   -544.7642 |

 42. |             Arkansas   .0185422   -.0185422   4460.403   -4460.403 |

 43. |             Oklahoma   .0102179   -.0102179   2383.868   -2383.868 |

 44. |               Nevada   .0080656   -.0080656   347.9611    347.9611 |

 45. |               Oregon   .0072873    .0072873   241.7805    241.7805 |

     |--------------------------------------------------------------------|

 46. |         Pennsylvania   .0063625   -.0063625    230.294    -230.294 |

 47. |             Colorado   .0056333    .0056333   4390.833    4390.833 |

 48. |              Indiana   .0053708   -.0053708   165.3883   -165.3883 |

 49. |            Tennessee   .0009869    .0009869   89.31062    89.31062 |

 50. |       North Carolina   .0009033    .0009033   46.82497   -46.82497 |

     |--------------------------------------------------------------------|

 51. |                 Iowa   .0008664    .0008664   17.83441    17.83441 |

 

 

 

* DC, NM, and HI, the 3 outlier states with the lowest proportion of NH White people, are the most influential points on the slope because they are outliers in X.

 

. codebook statefip, tab(60)

 

-------------------------------------------------------------------------------------------------------

statefip                                                                              State (FIPS code)

-------------------------------------------------------------------------------------------------------

 

                  Type: Numeric (byte)

                 Label: statefiplbl

 

                 Range: [1,56]                        Units: 1

         Unique values: 51                        Missing .: 0/51

 

            Tabulation: Freq.   Numeric  Label

                            1         1  Alabama

                            1         2  Alaska

                            1         4  Arizona

                            1         5  Arkansas

                            1         6  California

                            1         8  Colorado

                            1         9  Connecticut

                            1        10  Delaware

                            1        11  District of Columbia

                            1        12  Florida

                            1        13  Georgia

                            1        15  Hawaii

                            1        16  Idaho

                            1        17  Illinois

                            1        18  Indiana

                            1        19  Iowa

                            1        20  Kansas

                            1        21  Kentucky

                            1        22  Louisiana

                            1        23  Maine

                            1        24  Maryland

                            1        25  Massachusetts

                            1        26  Michigan

                            1        27  Minnesota

                            1        28  Mississippi

                            1        29  Missouri

                            1        30  Montana

                            1        31  Nebraska

                            1        32  Nevada

                            1        33  New Hampshire

                            1        34  New Jersey

                            1        35  New Mexico

                            1        36  New York

                            1        37  North Carolina

                            1        38  North Dakota

                            1        39  Ohio

                            1        40  Oklahoma

                            1        41  Oregon

                            1        42  Pennsylvania

                            1        44  Rhode Island

                            1        45  South Carolina

                            1        46  South Dakota

                            1        47  Tennessee

                            1        48  Texas

                            1        49  Utah

                            1        50  Vermont

                            1        51  Virginia

                            1        53  Washington

                            1        54  West Virginia

                            1        55  Wisconsin

                            1        56  Wyoming

 

* The meaning of the dfbetas: running take the original slope and SE, the dfbeta is how the slope would be different in units of SE without each point. For DC the dfbeta was 0.59, the original slope was -3761, and the SE of the slope was 2571. Without DC, this is what we would get.

 

 

. regress incwage  NH_White_proportion if statefip~=11

 

      Source |       SS           df       MS      Number of obs   =        50

-------------+----------------------------------   F(1, 48)        =      0.64

       Model |  5576993.22         1  5576993.22   Prob > F        =    0.4276

    Residual |   418239982        48  8713332.96   R-squared       =    0.0132

-------------+----------------------------------   Adj R-squared   =   -0.0074

       Total |   423816975        49  8649326.02   Root MSE        =    2951.8

 

-------------------------------------------------------------------------------------

            incwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

--------------------+----------------------------------------------------------------

NH_White_proportion |  -2252.848   2815.944    -0.80   0.428    -7914.684    3408.987

              _cons |   20928.61   2214.384     9.45   0.000     16476.29    25380.93

-------------------------------------------------------------------------------------

 

. display -3761.36+ (0.59033*2571.7)

-2243.2083

 

* My by-hand calculation of the slope without DC based on the DFbeta is not exactly the same as the actual slope without DC, but it is close.

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class14.log

  log type:  text

 closed on:   3 Nov 2021, 12:50:48

-------------------------------------------------------------------------------------------------------