-------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web
> pages\soc_meth_proj3\2011_180B_logs\class9.log
log type: text
opened on:
. *class starts here
* the first thing I did was copy the little Excel worksheet of Anscombe's data (including row headers), and paste it into Stata's data editor.
. *(8 variables, 11 observations pasted into data editor)
. twoway(scatter y1
x1)
* This makes just
a scatter plot of y1 and x1)
. twoway(scatter y1 x1) (lfit y1 x1)
* This produces the same scatter plot as above, but also superimposes the best fit line. And if we want to understand more about the best fit line, we run the regression (as below) and look at the numbers.
. regress y1 x1
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. predict regress_y1_x1
(option xb assumed; fitted values)
* This generates a new variable with the predicted values
. twoway (scatter y1 x1) (connected regress_y1_x1 x1, sort)
* This shows that the predicted values, when connected, are the same line we saw before.
. twoway(scatter y2 x2) (lfit y2 x2)
* Y2 and x2 have a very different pattern but the same line fits it.
. twoway(scatter y3 x3) (lfit
y3 x3)
. regress y1 x1
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. regress y2 x2
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.97
Model | 27.5000024 1 27.5000024 Prob > F = 0.0022
Residual | 13.776294 9 1.53069933 R-squared = 0.6662
-------------+------------------------------ Adj R-squared = 0.6292
Total | 41.2762964 10 4.12762964 Root MSE = 1.2372
------------------------------------------------------------------------------
y2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | .5 .1179638 4.24 0.002 .2331475 .7668526
_cons | 3.000909 1.125303 2.67 0.026 .4552978 5.54652
------------------------------------------------------------------------------
. regress y3 x3
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.97
Model | 27.4700075 1 27.4700075 Prob > F = 0.0022
Residual | 13.7561905 9 1.52846561 R-squared = 0.6663
-------------+------------------------------ Adj R-squared = 0.6292
Total | 41.2261979 10 4.12261979 Root MSE = 1.2363
------------------------------------------------------------------------------
y3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x3 | .4997273 .1178777 4.24 0.002 .2330695 .7663851
_cons | 3.002455 1.124481 2.67 0.026 .4587014 5.546208
------------------------------------------------------------------------------
* Notice how similar these regression results are? That is the point that Anscombe wanted to make, that is the point that Tufte makes with the same dataset, and that is the point I want you to think about. Sometimes the regression line can be misleading, you have to (if at all possible) look at the data.
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_180B_logs\class9.log
log type: text
closed on:
---------------------------------------------------------------------------------------