. set mem 50m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.733M

    set memory           50M    max. data space                 50.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                                52.987M

 

. use cps_y2k_numeric.dta

 

. twoway (scatter health age)

 

. *That graph was hideous. It filled up the whole screen because with 133K cases, every possible value was represented.  What we want to see instead is an average of health reports by age.

.

. *I'm going to create a new variable for age by decade.  The Int command just strips off the fractions.

 

. gen age_by_decade=int(age/10)

 

. table age, contents (mean age_by_decade)

 

--------------------------

      p15 | mean(age_by~e)

----------+---------------

        0 |              0

        1 |              0

        2 |              0

        3 |              0

        4 |              0

        5 |              0

        6 |              0

        7 |              0

        8 |              0

        9 |              0

       10 |              1

       11 |              1

       12 |              1

       13 |              1

       14 |              1

       15 |              1

       16 |              1

       17 |              1

       18 |              1

       19 |              1

       20 |              2

       21 |              2

       22 |              2

       23 |              2

       24 |              2

       25 |              2

       26 |              2

       27 |              2

       28 |              2

       29 |              2

       30 |              3

       31 |              3

       32 |              3

       33 |              3

       34 |              3

       35 |              3

       36 |              3

       37 |              3

       38 |              3

       39 |              3

       40 |              4

       41 |              4

       42 |              4

       43 |              4

       44 |              4

       45 |              4

       46 |              4

       47 |              4

       48 |              4

       49 |              4

       50 |              5

       51 |              5

       52 |              5

       53 |              5

       54 |              5

       55 |              5

       56 |              5

       57 |              5

       58 |              5

       59 |              5

       60 |              6

       61 |              6

       62 |              6

       63 |              6

       64 |              6

       65 |              6

       66 |              6

       67 |              6

       68 |              6

       69 |              6

       70 |              7

       71 |              7

       72 |              7

       73 |              7

       74 |              7

       75 |              7

       76 |              7

       77 |              7

       78 |              7

       79 |              7

       80 |              8

       81 |              8

       82 |              8

       83 |              8

       84 |              8

       85 |              8

       86 |              8

       87 |              8

       88 |              8

       89 |              8

       90 |              9

--------------------------

 

 

*Remember that all the graphs are made from the menus not from typing in the command line.

 

. graph box ernval2, medtype(line) over(sex, label(angle(vertical) labsize(small))) over(age_by_decade)

 

. graph box ernval2 if age>19, medtype(line) over(sex, label(angle(vertical) labsize(small))) over(age_by_decade) nooutsides

 

. exit, clear

--------------------------------------------------------------------------------------------

       log:  class log 2_2004.log

  log type:  text

 opened on:  27 May 2004, 11:10:05

 

. *Now I want to show you a few nice things you can do with the table command.

.

. set mem 50m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.733M

    set memory           50M    max. data space                 50.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                                52.987M

 

. use cps_y2k_numeric.dta, clear

 

. *If you want to compare earnings across racial groups, you could do something like this:

. sort  new_race

 

. by new_race: summarize ernval2 if age>19& age<30 [fweight=wgt2]

 

_______________________________________________________________________________

-> new_race = Non Hispanic White

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     ernval2 |  23761617     17402.6    18692.48          0     229339

 

_______________________________________________________________________________

-> new_race = Non Hispanic Black

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     ernval2 |   5035092    12706.78    14859.38          0     257525

 

_______________________________________________________________________________

-> new_race = NH American Indian

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     ernval2 |    322367       11999    15137.37          0     205817

 

_______________________________________________________________________________

-> new_race = NH Asian

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     ernval2 |   1744104    16716.66    20424.79          0     229339

 

_______________________________________________________________________________

-> new_race = Hispanic

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     ernval2 |   5494946    13050.39    14590.44          0     333564

 

 

. *This gives you the weighted observations, and the weighted average and standard deviation of earnings by new_race

.

. *now look at this:

.

. table new_race if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq)

 

-------------------------------------------------

race and Hispanic  |

combined           | mean(ernval2)          Freq.

-------------------+-----------------------------

Non Hispanic White |       17402.6       2.38e+07

Non Hispanic Black |      12706.78        5035092

NH American Indian |         11999        322,367

          NH Asian |      16716.66        1744104

          Hispanic |      13050.39        5494946

-------------------------------------------------

 

. *Table gives you the same answer more concisely.

.

. *Now let's suppose you want to look at income by race and gender:

.

. *you could sort race sex, and then by race sex: summarize ernval2, but table makes it easier:

.

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq)

 

---------------------------------------

race and Hispanic  |        p20       

combined           |     male    female

-------------------+-------------------

Non Hispanic White | 20692.28  14142.71

                   | 1.18e+07  1.19e+07

                   |

Non Hispanic Black | 14393.64  11355.78

                   |  2239211   2795881

                   |

NH American Indian | 14200.51  10289.88

                   |  140,889   181,478

                   |

          NH Asian | 20740.33  12991.02

                   |  838,514   905,590

                   |

          Hispanic | 16493.41  9512.212

                   |  2784925   2710021

---------------------------------------

 

. *Notice how in each cell (that is each combination of race and sex), we have the weighted average earnings and the weighted count of observations.

. *You can run the same command again without the weights to get the unweighted counts, or you can try this:

.

. gen byte ones=1

 

. *This just creates a variable that's one for everybody,which we can use to get unweighted sums from table..

 

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones)

 

---------------------------------------

race and Hispanic  |        p20       

combined           |     male    female

-------------------+-------------------

Non Hispanic White | 20692.28  14142.71

                   | 1.18e+07  1.19e+07

                   |     4897      5213

                   |

Non Hispanic Black | 14393.64  11355.78

                   |  2239211   2795881

                   |      680       946

                   |

NH American Indian | 14200.51  10289.88

                   |  140,889   181,478

                   |      103       113

                   |

          NH Asian | 20740.33  12991.02

                   |  838,514   905,590

                   |      345       379

                   |

          Hispanic | 16493.41  9512.212

                   |  2784925   2710021

                   |     1190      1144

---------------------------------------

 

. *The freq command takes weights into account (and gives us weighted frequency here) but the rawsum command ignores the weights).

. *now let's create a third variable:

.

. *A familiar one, nativity

. tabulate citizen

 

                 citizenship p733 |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                native born in US |    116,220       86.92       86.92

      native, born in territories |      1,090        0.82       87.73

native, born abroad of US parents |        976        0.73       88.46

        foreign born, naturalized |      5,348        4.00       92.46

     foreign born, non US citizen |     10,076        7.54      100.00

----------------------------------+-----------------------------------

                            Total |    133,710      100.00

 

. tabulate citizen, nolab

 

citizenship |

       p733 |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |    116,220       86.92       86.92

          2 |      1,090        0.82       87.73

          3 |        976        0.73       88.46

          4 |      5,348        4.00       92.46

          5 |     10,076        7.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. gen byte immigrant=0

 

. replace immigrant=1 if citizen>3

(15424 real changes made)

 

. label define 0 "US born" 1 "immigrant"

0 invalid name

r(198);

 

. label define imm_lbl 0 "US born" 1 "immigrant"

 

. label val immigrant imm_lbl

 

. *Now let's look at a 3 way table

. table new_race sex immigrant if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones)

 

-------------------------------------------------------------

                   |            immigrant and p20           

race and Hispanic  | ----- US born ----    ---- immigrant ---

combined           |     male    female        male    female

-------------------+-----------------------------------------

Non Hispanic White | 20732.47  14231.42    19613.95  11255.55

                   | 1.14e+07  1.16e+07     424,922   355,745

                   |     4729      5072         168       141

                   |

Non Hispanic Black | 14760.95  11399.67    9874.854  10493.13

                   |  2070880   2660509     168,331   135,372

                   |      622       900          58        46

                   |

NH American Indian |  14201.4  10449.44    14178.08  6290.192

                   |  135,516   174,516       5,373     6,962

                   |      100       111           3         2

                   |

          NH Asian | 19099.27  13189.94    21342.07  12907.51

                   |  224,973   267,765     613,541   637,825

                   |      108       128         237       251

                   |

          Hispanic |  18517.5  11292.68    14793.59  7468.546

                   |  1271208   1448270     1513717   1261751

                   |      546       617         644       527

-------------------------------------------------------------

 

. *So here we have mean income (taking the weights into account), weighted count and unweighted count for people in their 20s by race, sex and immigrant status.  Table can do a lot of work and report results very compactly

. *We could also have done:

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones) by(immigrant)

 

---------------------------------------

immigrant and race |

and Hispanic       |        p20       

combined           |     male    female

-------------------+-------------------

US born            |

Non Hispanic White | 20732.47  14231.42

                   | 1.14e+07  1.16e+07

                   |     4729      5072

                   |

Non Hispanic Black | 14760.95  11399.67

                   |  2070880   2660509

                   |      622       900

                   |

NH American Indian |  14201.4  10449.44

                   |  135,516   174,516

                   |      100       111

                   |

          NH Asian | 19099.27  13189.94

                   |  224,973   267,765

                   |      108       128

                   |

          Hispanic |  18517.5  11292.68

                   |  1271208   1448270

                   |      546       617

-------------------+-------------------

immigrant          |

Non Hispanic White | 19613.95  11255.55

                   |  424,922   355,745

                   |      168       141

                   |

Non Hispanic Black | 9874.854  10493.13

                   |  168,331   135,372

                   |       58        46

                   |

NH American Indian | 14178.08  6290.192

                   |    5,373     6,962

                   |        3         2

                   |

          NH Asian | 21342.07  12907.51

                   |  613,541   637,825

                   |      237       251

                   |

          Hispanic | 14793.59  7468.546

                   |  1513717   1261751

                   |      644       527

---------------------------------------

 

. exit, clear

 

*At this point I could have saved the data file, because I've added several new variables.