---------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\2010_logs\class_eight.log

log type:  text

opened on:  18 Feb 2010, 14:27:39

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",

>  clear

* We were looking at the Excel file on T-tests and talking about how changing the sample size might change the T-statistic. Here below I am just showing how we might do something that I mentioned in the Excel file, which is randomly reduce the sample size and see what the result is. In theory, taking one fourth the sample size should result in T-statistic of about half.

. gen random=runiform()

. summarize  random

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

random |    133710    .5010981    .2889151   3.11e-06   .9999956

. histogram random

(bin=51, start=3.108e-06, width=.0196077)

* Can't see the histogram in the log, but trust me the runiform() command yields a uniform distribution.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* Above is our original t-test.

. ttest yrsed if age>24 & age<35 & random<.25, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    2156    13.25997    .0638599    2.965191    13.13474    13.38521

Female |    2449    13.64516    .0573826    2.839716    13.53264    13.75768

---------+--------------------------------------------------------------------

combined |    4605    13.46482    .0428114    2.905188    13.38089    13.54875

---------+--------------------------------------------------------------------

diff |           -.3851891    .0856179               -.5530413   -.2173369

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -4.4989

Ho: diff = 0                                     degrees of freedom =     4603

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* The second t-test, above, has a random selection of 1/4 of the original dataset (taking the variable random only <.25, which is 25% of all values since random goes from 0 to 1), but in this random sample the difference between the two groups is not -.244 but rather -.385, because random samples vary.

. ttest yrsed if age>24 & age<35 & random>.75, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    2232    13.26501    .0628787    2.970647     13.1417    13.38832

Female |    2376    13.50547    .0588604    2.869104    13.39005    13.62089

---------+--------------------------------------------------------------------

combined |    4608      13.389    .0430287    2.920886    13.30464    13.47335

---------+--------------------------------------------------------------------

diff |           -.2404624    .0860359                -.409134   -.0717908

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -2.7949

Ho: diff = 0                                     degrees of freedom =     4606

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0026         Pr(|T| > |t|) = 0.0052          Pr(T > t) = 0.9974

. *These are two random subsets of the whole dataset. In theory, one fourth of the data should give you T/2, but random subsets have random variation in terms of the mean difference between groups as well..

. exit, clear