-------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web

>  pages\soc_meth_proj3\2011_180B_logs\class9.log

  log type:  text

 opened on:  22 Feb 2011, 13:34:20

 

. *class starts here

 

* the first thing I did was copy the little Excel worksheet of Anscombe's data (including row headers), and paste it into Stata's data editor.

 

. *(8 variables, 11 observations pasted into data editor)

 

. twoway(scatter y1 x1)

* This makes just a scatter plot of y1 and x1)

 

. twoway(scatter y1 x1) (lfit y1 x1)

 * This produces the same scatter plot as above, but also superimposes the best fit line. And if we want to understand more about the best fit line, we run the regression (as below) and look at the numbers.

 

. regress y1 x1

 

      Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.99

       Model |  27.5100011     1  27.5100011           Prob > F      =  0.0022

    Residual |  13.7626904     9  1.52918783           R-squared     =  0.6665

-------------+------------------------------           Adj R-squared =  0.6295

       Total |  41.2726916    10  4.12726916           Root MSE      =  1.2366

 

------------------------------------------------------------------------------

          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

          x1 |   .5000909   .1179055     4.24   0.002     .2333701    .7668117

       _cons |   3.000091   1.124747     2.67   0.026     .4557369    5.544445

------------------------------------------------------------------------------

 

. predict regress_y1_x1

(option xb assumed; fitted values)

* This generates a new variable with the predicted values

 

. twoway (scatter y1 x1) (connected regress_y1_x1 x1, sort)

 * This shows that the predicted values, when connected, are the same line we saw before.

 

. twoway(scatter y2 x2) (lfit y2 x2)

 * Y2 and x2 have a very different pattern but the same line fits it.

 

. twoway(scatter y3 x3) (lfit y3 x3)

 

. regress y1 x1

 

      Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.99

       Model |  27.5100011     1  27.5100011           Prob > F      =  0.0022

    Residual |  13.7626904     9  1.52918783           R-squared     =  0.6665

-------------+------------------------------           Adj R-squared =  0.6295

       Total |  41.2726916    10  4.12726916           Root MSE      =  1.2366

 

------------------------------------------------------------------------------

          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

          x1 |   .5000909   .1179055     4.24   0.002     .2333701    .7668117

       _cons |   3.000091   1.124747     2.67   0.026     .4557369    5.544445

------------------------------------------------------------------------------

 

. regress y2 x2

 

      Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.97

       Model |  27.5000024     1  27.5000024           Prob > F      =  0.0022

    Residual |   13.776294     9  1.53069933           R-squared     =  0.6662

-------------+------------------------------           Adj R-squared =  0.6292

       Total |  41.2762964    10  4.12762964           Root MSE      =  1.2372

 

------------------------------------------------------------------------------

          y2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

          x2 |         .5   .1179638     4.24   0.002     .2331475    .7668526

       _cons |   3.000909   1.125303     2.67   0.026     .4552978     5.54652

------------------------------------------------------------------------------

 

 

. regress y3 x3

 

      Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.97

       Model |  27.4700075     1  27.4700075           Prob > F      =  0.0022

    Residual |  13.7561905     9  1.52846561           R-squared     =  0.6663

-------------+------------------------------           Adj R-squared =  0.6292

       Total |  41.2261979    10  4.12261979           Root MSE      =  1.2363

 

------------------------------------------------------------------------------

          y3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

          x3 |   .4997273   .1178777     4.24   0.002     .2330695    .7663851

       _cons |   3.002455   1.124481     2.67   0.026     .4587014    5.546208

------------------------------------------------------------------------------

 

* Notice how similar these regression results are? That is the point that Anscombe wanted to make, that is the point that Tufte makes with the same dataset, and that is the point I want you to think about. Sometimes the regression line can be misleading, you have to (if at all possible) look at the data.

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_180B_logs\class9.log

  log type:  text

 closed on:  22 Feb 2011, 15:10:34

---------------------------------------------------------------------------------------