-----------------------------------------------------------------------------------

†††††† log:† C:\AAA Miker Files\newer web pages\soc_meth_proj3\section3_2009.log

† log type:† text

†opened on:† 10 Feb 2009, 15:25:54

 

. set mem 200m

 

Current memory allocation

 

††††††††††††††††††† current†††††††††††††††††††††††††††††††† memory usage

††† settable††††††††† value†††† description†††††††††††††††† (1M = 1024k)

††† --------------------------------------------------------------------

††† set maxvar†††† ††††5000†††† max. variables allowed†††††††††† 1.909M

††† set memory††††††††† 200M††† max. data space††††††††††††††† 200.000M

††† set matsize†††††††† 400†††† max. RHS vars in models††††††††† 1.254M

††††††††††††††††††††††††††††††††††††††††††††††††††††††† ††††-----------

†††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††† 203.163M

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

. xi i.occ1990

--Break--

r(1);

 

*It might seem convenient to generate dummy variables for the entire set of occupational categories, but it is actually not a great idea, because there are something like a thousand different occupational categories, and that would add a thousand dummy variables to your dataset. You might run out of memory. Much better is to first create a variable for the 3 occupations you want, perhaps setting all other to missing, and then use that variable to generate the dummies.

 

. memory

†††††††††††††††††††††††† †††††††††††††††††††††††††bytes

--------------------------------------------------------------------

Details of set memory usage

††† overhead (pointers)†††††††††††††††††††††††† 534,840††††††† 0.26%

††† data†††††††††††††††††††††††††††††††††††† 45,327,690†††††† 21.61%

††††††††††††††††††††††††††††††††††††††† ----------------------------

††† data + overhead††††††††††††††††††††††††† 45,862,530†††††† 21.87%

††† free††††††††††††††††††††††††††††††††††† 163,852,662†††††† 78.13%

†††††††††††††††††††††††††††††† †††††††††----------------------------

††† Total allocated†††††††††††††††††††††††† 209,715,192††††† 100.00%

--------------------------------------------------------------------

Other memory usage

††† set maxvar usage††††††††††††††††††††††††† 2,001,730

††† set matsize usage†††††††††††††††††††††††† 1,315,200

††† programs, saved results, etc.††††††††††††††† 76,822

††††††††††††††††††††††††††††††††††††††† ---------------

††† Total†††††††††††††††††††††††††††††††††††† 3,393,752

-------------------------------------------------------

Grand total†††††††††††††††††††††††††††††††† 213,108,944

 

. *xi of occ1990 was going to make a thousand new dummy variables. That is not what we want or need.

 

. drop _Iocc*

 

. memory

†††††††††††††††††††††††††††††††† †††††††††††††††††bytes

--------------------------------------------------------------------

Details of set memory usage

††† overhead (pointers)†††††††††††††††††††††††† 534,840††††††† 0.26%

††† data†††††††††††††††††††††††††††††††††††† 15,242,940††††††† 7.27%

††††††††††††††††††††††††††††††††††††††† ----------------------------

††† data + overhead††††††††††††††††††††††††† 15,777,780††††††† 7.52%

††† free††††††††††††††††††††††††††††††††††† 193,937,412†††††† 92.48%

†††††††††††††††††††††††††††††††††††††† †----------------------------

††† Total allocated†††††††††††††††††††††††† 209,715,192††††† 100.00%

--------------------------------------------------------------------

Other memory usage

††† set maxvar usage††††††††††††††††††††††††† 2,001,730

††† set matsize usage†††††††††††††††††††††††† 1,315,200

††† programs, saved results, etc.††††††††††††††† 73,958

††††††††††††††††††††††††††††††††††††††† ---------------

††† Total†††††††††††††††††††††††††††††††††††† 3,390,888

-------------------------------------------------------

Grand total†††††††††††††††††††††††††††††††† 213,106,080

 

. tabulate metro

 

† Metropolitan central city |

†††††††††††††††††††† status |††††† Freq.†††† Percent††††††† Cum.

----------------------------+-----------------------------------

†††††††††† Not identifiable |††††††† 340††††††† 0.25††††††† 0.25

††††††††† Not in metro area |†††† 29,658†††††† 22.18†††††† 22.44

†††††††††††††† Central city |†††† 32,481†††††† 24.29†††††† 46.73

†††††† Outside central city |†††† 51,468†††††† 38.49†††††† 85.22

Central city status unknown |†††† 19,763†††††† 14.78††††† 100.00

----------------------------+-----------------------------------

††††††††††††††††††††† Total |††† 133,710††††† 100.00

 

. tabulate metro, nolab

 

Metropolita |

† n central |

city status |††††† Freq.†††† Percent††††††† Cum.

------------+-----------------------------------

††††††††† 0 |††††††† 340††††††† 0.25††††††† 0.25

††††††††† 1 |†††† 29,658†††††† 22.18†††††† 22.44

††††††††† 2 |†††† 32,481†††††† 24.29†††††† 46.73

† ††††††††3 |†††† 51,468†††††† 38.49†††††† 85.22

††††††††† 4 |†††† 19,763†††††† 14.78††††† 100.00

------------+-----------------------------------

††††† Total |††† 133,710††††† 100.00

 

. char metro[omit] 1

 

*redoing some things we did in class, making dummy variables for metro and putting them in the regression.

 

. xi i.metro

i.metro†††††††††† _Imetro_0-4 ††††††††(naturally coded; _Imetro_1 omitted)

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1

 

††††† Source |†††††† SS†††††† df†††††† MS††††††††††††† Number of obs =†† 29335

-------------+------------------------------†††††††††† F(† 4, 29330) =† 190.17

†††††† Model |† 1.1316e+12†††† 4† 2.8291e+11†††††††††† Prob > F††††† =† 0.0000

††† Residual |† 4.3633e+13 29330† 1.4877e+09†††††††††† R-squared†††† =† 0.0253

-------------+------------------------------†††††††††† Adj R-squared =† 0.0251

††††† †Total |† 4.4765e+13 29334† 1.5260e+09†††††††††† Root MSE††††† =†† 38570

 

------------------------------------------------------------------------------

†††† incwage |††††† Coef.†† Std. Err.††††† t††† P>|t|†††† [95% Conf. Interval]

-------------+----------------------------------------------------------------

†† _Imetro_0 |†† 4553.396†† 4006.316†††† 1.14†† 0.256††† -3299.164††† 12405.96

†† _Imetro_2 |†† 7255.712†† 667.5305††† 10.87†† 0.000†††† 5947.322††† 8564.102

†† _Imetro_3 |†† 16013.39†† 593.5204††† 26.98†† 0.000†††† 14850.06††† 17176.71

†† _Imetro_4 |†† 8368.313†† 758.1121††† 11.04†† 0.000††††† 6882.38††† 9854.247

†††††† _cons |†† 27189.65†† 473.7616††† 57.39†† 0.000†††† 26261.05††† 28118.24

------------------------------------------------------------------------------

 

. *This is the results we got today in class...

 

. *metro==1, rural, is the excluded category here. Everything else is compared to rural.

 

. *The constant is the excluded category mean.

 

. table metro if age>29 & age<65 & sex==1, contents (mean incwage)

 

-------------------------------------------

Metropolitan central city†† |

status††††††††††††††††††††† | mean(incwage)

----------------------------+--------------

†††††††††† Not identifiable |†† 31743.04255

† ††††††††Not in metro area |††† 27189.6465

†††††††††††††† Central city |†† 34445.35841

†††††† Outside central city |††† 43203.0348

Central city status unknown |†† 35557.95997

-------------------------------------------

 

. *One variation is to include the weights

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage)

 

-------------------------------------------

Metropolitan central city†† |

status††††††††††††††††††††† | mean(incwage)

----------------------------+--------------

†††††††††† Not identifiable |††† 32020.1697

††††††††† Not in metro area |†† 27344.17913

†††††††††††††† Central city |††† 34517.6849

†††††† Outside central city |†† 43963.55122

Central city status unknown |†† 35398.57026

-------------------------------------------

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city ††|

status††††††††††††††††††††† | mean(incwage)††† sd(incwage)††††††††† Freq.

----------------------------+--------------------------------------------

†††††††††† Not identifiable |††† 32020.1697†††††† 27352.47†††††††††††† 94

††††††††† Not in metro area |†† 27344.17913†††††† 28233.76††††††††† 6,628

†††††††††††††† Central city |††† 34517.6849†††††† 38462.56††††††††† 6,727

†††††† Outside central city |†† 43963.55122†††††† 44645.15†††††††† 11,639

Central city status unknown |†† 35398.57026†††††† 36143.29††††††††† 4,247

-------------------------------------------------------------------------

 

.

. *The thing about aweight is that it adjusts the mean but not the N.

 

. table metro if age>29 & age<65 & sex==1, contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city†† |

status††††††††††††††††††††† | mean(incwage)††† sd(incwage)††††††††† Freq.

----------------------------+--------------------------------------------

†††††††††† Not identifiable |†† 31743.04255†††††† 27474.74†††††††††††† 94

††††††††† Not in metro area |††† 27189.6465†††††† 28299.05††††††††† 6,628

†††††††††††††† Central city |†† 34445.35841†††††† 38491.83††††††††† 6,727

†††††† Outside central city |††† 43203.0348†††††† 44057.68†††††††† 11,639

Central city status unknown |†† 35557.95997†††††† 36639.06††††††††† 4,247

-------------------------------------------------------------------------

 

. *In theory, aweighted analysis is better.

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city†† |

status††††††††††††††††††††† | mean(incwage)††† sd(incwage)††††††††† Freq.

----------------------------+--------------------------------------------

†††††††††† Not identifiable |††† 32020.1697†††††† 27352.47†††††††††††† 94

††††††††† Not in metro area |†† 27344.17913†††††† 28233.76††††††††† 6,628

†††††††††††††† Central city |††† 34517.6849†††††† 38462.56††††††††† 6,727

†††††† Outside central city |†† 43963.55122†††††† 44645.15†††††††† 11,639

Central city status unknown |†† 35398.57026†††††† 36143.29††††††††† 4,247

-------------------------------------------------------------------------

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1 [aweight= perwt_rounded]

(sum of wgt is†† 6.0783e+07)

 

††††† Source |†††††† SS†††††† df†††††† MS††††††††††††† Number of obs =†† 29335

-------------+------------------------------†††††††††† F(† 4, 29330) =† 191.94

†††††† Model |† 1.1913e+12†††† 4† 2.9784e+11†††††††††† Prob > F††††† =† 0.0000

††† Residual |† 4.5511e+13 29330† 1.5517e+09†††††††††† R-squared†††† =† 0.0255

-------------+------------------------------†††††††††† Adj R-squared =† 0.0254

†††††† Total |† 4.6703e+13 29334† 1.5921e+09†††††††††† Root MSE††††† =†† 39392

 

------------------------------------------------------------------------------

†††† incwage |††††† Coef.†† Std. Err.††††† t †††P>|t|†††† [95% Conf. Interval]

-------------+----------------------------------------------------------------

†† _Imetro_0 |†† 4675.991†† 4166.002†††† 1.12†† 0.262††† -3489.561††† 12841.54

†† _Imetro_2 |†† 7173.506†† 713.0054††† 10.06†† 0.000†††† 5775.983††† 8571.028

†† _Imetro_3 |†† 16619.37†† 632.9456††† 26.26†† 0.000†††† 15378.77††† 17859.97

†† _Imetro_4 |†† 8054.391†† 819.5563†††† 9.83†† 0.000†††† 6448.024††† 9660.758

†††††† _cons |†† 27344.18†† 529.7572††† 51.62†† 0.000†††† 26305.83††† 28382.53

------------------------------------------------------------------------------

 

. *Standard regression is really a regression of mean or average values.

 

* rather than use xi, I prefer another dummy variable generator called desmat, a free add-on to stata.

 

. ssc install desmat, replace

checking desmat consistency and verifying not already installed...

 

the following files will be replaced:

††† c:\ado\stbplus\d\desmat.ado

 

installing into c:\ado\stbplus\...

installation complete.

 

. desmat: regress incwage metro=ind(2) if age>29 & age<65 & sex==1 [aweight= perwt_rounded]

---------------------------------------------------------------------------------

†† Linear regression

---------------------------------------------------------------------------------

†† Dependent variable††††††††††††††††††††††††††††††††††††††††††††††††††† incwage

†† Number of observations:†††††††††††††††††††††††††††††††††††††††††††††††† 29335

†† aweight:††††††††††††††††††††††††††††††††††††††††††††††††††††††† perwt_rounded

†† F statistic:††††††††††††† ††††††††††††††††††††††††††††††††††††††††††††191.942

†† Model degrees of freedom:†††††††††††††††††††††††††††††††††††††††††††††††††† 4

†† Residual degrees of freedom:††††††††††††††††††††††††††††††††††††††††††† 29330

†† R-squared:††††††††††††††††††††††††† ††††††††††††††††††††††††††††††††††††0.026

†† Adjusted R-squared:†††††††††††††††††††††††††††††††††††††††††††††††††††† 0.025

†† Root MSE††††††††††††††††††††††††††††††††††††††††††††††††††††††††††† 39391.542

†† Prob:†††††††††††††††††††††††††††††††††††††††† ††††††††††††††††††††††††††0.000

---------------------------------------------------------------------------------

nr Effect††††††††††††††††††††††††††††††††††††††††††††††††††††† Coeff††††††† s.e.

---------------------------------------------------------------------------------

†† metro

1††† Not identifiable†††††††††††††††††††††††††††††††††††††† 4675.991††† 4166.002

2††† Central city†††††††††††††††††††††††††††††††††††††††††† 7173.506**†† 713.005

3††† Outside central city††††††††††††††††††††††††††††††† ††16619.372**†† 632.946

4††† Central city status unknown††††††††††††††††††††††††††† 8054.391**†† 819.556

5† _cons†††††††††††††††††††††††††††††††††††††††††††††††††† 27344.179**†† 529.757

---------------------------------------------------------------------------------

*† p < .05

** p < .01

 

. *I used desmat to run the regression, and to make the dummy variables at the same time, also telling stata to use the second category of metro as the excluded category.

 

. *desmat creates its own dummy variables, coding them _x_1, _x_2, etc.

 

*How to calculate p values from a given T statistic? Letís say the T-statistic is 2.5, and the N is 1500.

 

. display ttail (1500, 2.5)

ttail not found

r(111);

 

. display ttail(1500, 2.5)

.0062627

 

. *How to generate P values from a T statistic

 

. *for a 2 tail test, we would want to double this the P value that stata gives us (which is just the one-sided tail probability).

 

. display .0062627*2

.0125254

 

. *a little more than 1%, but less than 5%

 

. display 2*(ttail(20, 2.5))

.02123355

 

. *Note that for ttail, as for a lot of stata commands that require parentheses, it did not like the space between the command and the paren.

 

. clear all

 

. exit, clear