---------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web p

> ages\soc_meth_proj3\2010_logs\class_nine.log

  log type:  text

 opened on:  23 Feb 2010, 14:01:05

 

 

. clear all

 

. *(8 variables, 11 observations pasted into data editor)

 

*It doesn't show up in the log, but to copy data from Excel to Stata, just copy in Excel and paste into Stata's data editor window.

 

. twoway(scatter x1 y1)

 

. twoway(scatter y1 x1)

 

*we wanted (scatter y x)

 

. twoway(scatter y1 x1) (lfit y1 x1)

 

* lfit adds the best fit line to the graph

 

 

. twoway(scatter y2 x2) (lfit y2 x2)

 

. *One reason we graph data is so that we can see whether our best fit line or our best fit function really fits the data

 

. clear

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",

>  clear

 

. *a free stata update is available

 

. *update all

 

. *update swap

 

* do the update all command, without the asterisk.

 

. regress incwage lawyers sociologists male if occ1990==178| occ1990==95 | occ199

> 0==125

variable lawyers not found

r(111);

 

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125

 

                 Occupation, 1990 basis |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                      Registered nurses |        966       68.37       68.37

                  Sociology instructors |          6        0.42       68.79

                                Lawyers |        441       31.21      100.00

----------------------------------------+-----------------------------------

                                  Total |      1,413      100.00

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125, nolab

 

Occupation, |

 1990 basis |      Freq.     Percent        Cum.

------------+-----------------------------------

         95 |        966       68.37       68.37

        125 |          6        0.42       68.79

        178 |        441       31.21      100.00

------------+-----------------------------------

      Total |      1,413      100.00

 

* Take a look at my excel sheet, the worksheet on "more regression fits"

 

. gen reduced_occ1990=1 if occ1990==95

no room to add more variables because of width

    An attempt was made to add a variable that would have increased the memory

    required to store an observation beyond what is currently possible.  You have

    the following alternatives:

 

     1.  Store existing variables more efficiently; see help compress.

 

     2.  Drop some variables or observations; see help drop.  (Think of Stata's

         data area as the area of a rectangle; Stata can trade off width and

         length.)

 

     3.  Increase the amount of memory allocated to the data area using the set

         memory command; see help memory.

r(902);

 

. clear all

 

. set mem 200m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory          200M    max. data space                200.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               203.163M

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",

>  clear

 

. gen reduced_occ1990=1 if occ1990==95

(132744 missing values generated)

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125

 

                 Occupation, 1990 basis |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                      Registered nurses |        966       68.37       68.37

                  Sociology instructors |          6        0.42       68.79

                                Lawyers |        441       31.21      100.00

----------------------------------------+-----------------------------------

                                  Total |      1,413      100.00

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125, nolab

 

Occupation, |

 1990 basis |      Freq.     Percent        Cum.

------------+-----------------------------------

         95 |        966       68.37       68.37

        125 |          6        0.42       68.79

        178 |        441       31.21      100.00

------------+-----------------------------------

      Total |      1,413      100.00

 

. replace  reduced_occ1990=2 if occ1990==125

(6 real changes made)

 

. replace  reduced_occ1990=3 if occ1990==178

(441 real changes made)

 

. label define reduced_occ 1 "nurses" 2 "sociologists" 3 "lawyers"

 

. label val  reduced_occ1990 reduced_occ

 

. tabulate occ1990  reduced_occ1990 if  reduced_occ1990!=.

 

     Occupation, 1990 |         reduced_occ1990

                basis |    nurses  sociologi    lawyers |     Total

----------------------+---------------------------------+----------

    Registered nurses |       966          0          0 |       966

Sociology instructors |         0          6          0 |         6

              Lawyers |         0          0        441 |       441

----------------------+---------------------------------+----------

                Total |       966          6        441 |     1,413

 

 

. *OK... Now I am ready to repeat the regression from the Excel file.

 

. desmat: regress incwage  reduced_occ1990 male

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                  1413

   F statistic:                                                           83.696

   Model degrees of freedom:                                                   3

   Residual degrees of freedom:                                             1409

   R-squared:                                                              0.151

   Adjusted R-squared:                                                     0.149

   Root MSE                                                            42234.929

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

   reduced_occ1990

1    sociologists                                           -604.948   17320.322

2    lawyers                                               25723.531**  3256.452

   male

3    male                                                  17003.194**  3422.971

4  _cons                                                   36445.550**  1376.531

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. *this is just a little income regression, excluding everyone except the nurses, lawyers, and sociologists

 

. predict new_model

(option xb assumed; fitted values)

(132297 missing values generated)

* predict creates a new variable for the predicted values of the model, which I named new_model

 

. table  reduced_occ1990 sex, contents (freq mean incwage mean  new_model) row col

 

----------------------------------------------------

reduced_occ1 |                  Sex                

990          |        Male       Female        Total

-------------+--------------------------------------

      nurses |          62          904          966

             | 48602.45161   36777.9281  37536.85197

             |    53448.74     36445.55     37536.85

             |

sociologists |           2            4            6

             |       39200      42662.5  41508.33333

             |     52843.8      35840.6     41508.33

             |

     lawyers |         308          133          441

             | 80236.42208  59704.73684  74044.32653

             |    79172.27     62169.08     74044.33

             |

       Total |         372        1,041        1,413

             | 74743.46774  39729.70893  48947.76858

             |    74743.47     39729.71     48947.77

----------------------------------------------------

 

. *Note 1: the constant term in the model corresponds to the comparison group for both variables, which is female nurses, but the value of the constant (36,445) is the predicted rather than the actual female nurse mean income, and the predicted and actual values are not exactly the same. The regression has only 1 term for gender gap, and thus assumes equal gender gap across occupational lines.

 

. *Also note that the overall average income is the same and the average for each occupational group (total, across genders) and the average for each gender group (total across occupations) is exactly the same between real incwage and fitted values from the model.

 

. rename  new_model fitted_values_inwage_1

 

* Here I am just giving the fitted values for incwage a better and more appropriate name.

 

. gen residuals=incwage- fitted_values_inwage_1

(132297 missing values generated)

 

* incwage are the actual wages for each respondent. fitted_values_inwage_1 are the predicted income for each respondent. The residuals are the difference between the two.

 

. summarize  residuals

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

   residuals |      1413    -.000951    42190.04  -79172.27   297118.4

 

* Residuals will average zero, because the mean of the predicted and actual values should be the same.

 

. *we have predicted values and residuals only for the 1413 cases in our 3 occupational groups

 

* If we want to fit occupational and gender income averages more fully, we need to account for the fact that the gender wage gap is different in each occupational group.

 

. desmat: regress incwage  reduced_occ1990*male

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                  1413

   F statistic:                                                           50.578

   Model degrees of freedom:                                                   5

   Residual degrees of freedom:                                             1407

   R-squared:                                                              0.152

   Adjusted R-squared:                                                     0.149

   Root MSE                                                            42237.424

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

   reduced_occ1990

1    sociologists                                           5884.572   21165.383

2    lawyers                                               22926.809**  3922.625

   male

3    male                                                  11824.524*   5545.056

   reduced_occ1990.male

4    sociologists.male                                    -15287.024   36996.590

5    lawyers.male                                           8707.162    7067.771

6  _cons                                                   36777.928**  1404.796

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. *The asterisk gave us dummy variables not only for occ and gender, but for the occ-gender combinations. Now we have 6 terms in the model, and we should be fitting our occ-gender income table exactly.

 

. predict fitted_income_m2

(option xb assumed; fitted values)

(132297 missing values generated)

 

. table  reduced_occ1990 sex, contents (freq mean incwage mean   fitted_income_m2) row col

 

----------------------------------------------------

reduced_occ1 |                  Sex                

990          |        Male       Female        Total

-------------+--------------------------------------

      nurses |          62          904          966

             | 48602.45161   36777.9281  37536.85197

             |    48602.45     36777.93     37536.86

             |

sociologists |           2            4            6

             |       39200      42662.5  41508.33333

             |       39200      42662.5     41508.33

             |

     lawyers |         308          133          441

             | 80236.42208  59704.73684  74044.32653

             |    80236.42     59704.74     74044.33

             |

       Total |         372        1,041        1,413

             | 74743.46774  39729.70893  48947.76858

             |    74743.47     39729.71     48947.77

----------------------------------------------------

 

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

 

. exit, clear