Last class and section notes

. set mem 50m

Current memory allocation

current memory usage

settable value description (1M = 1024k)

--------------------------------------------------------------------

set maxvar 5000 max. variables allowed 1.733M

set memory 50M max. data space 50.000M

set matsize 400 max. RHS vars in models 1.254M

-----------

52.987M

. use cps_y2k_numeric.dta

. twoway (scatter health age)

. *That graph was hideous. It filled up the whole screen because with 133K cases, every possible value was represented. What we want to see instead is an average of health reports by age.

.

. *I'm going to create a new variable for age by decade. The Int command just strips off the fractions.

. gen age_by_decade=int(age/10)

. table age, contents (mean age_by_decade)

--------------------------

p15 | mean(age_by~e)

----------+---------------

0 | 0

1 | 0

2 | 0

3 | 0

4 | 0

5 | 0

6 | 0

7 | 0

8 | 0

9 | 0

10 | 1

11 | 1

12 | 1

13 | 1

14 | 1

15 | 1

16 | 1

17 | 1

18 | 1

19 | 1

20 | 2

21 | 2

22 | 2

23 | 2

24 | 2

25 | 2

26 | 2

27 | 2

28 | 2

29 | 2

30 | 3

31 | 3

32 | 3

33 | 3

34 | 3

35 | 3

36 | 3

37 | 3

38 | 3

39 | 3

40 | 4

41 | 4

42 | 4

43 | 4

44 | 4

45 | 4

46 | 4

47 | 4

48 | 4

49 | 4

50 | 5

51 | 5

52 | 5

53 | 5

54 | 5

55 | 5

56 | 5

57 | 5

58 | 5

59 | 5

60 | 6

61 | 6

62 | 6

63 | 6

64 | 6

65 | 6

66 | 6

67 | 6

68 | 6

69 | 6

70 | 7

71 | 7

72 | 7

73 | 7

74 | 7

75 | 7

76 | 7

77 | 7

78 | 7

79 | 7

80 | 8

81 | 8

82 | 8

83 | 8

84 | 8

85 | 8

86 | 8

87 | 8

88 | 8

89 | 8

90 | 9

--------------------------

*Remember that all the graphs are made from the menus not from typing in the command line.

. graph box ernval2, medtype(line) over(sex, label(angle(vertical) labsize(small))) over(age_by_decade)

. graph box ernval2 if age>19, medtype(line) over(sex, label(angle(vertical) labsize(small))) over(age_by_decade) nooutsides

. exit, clear

--------------------------------------------------------------------------------------------

log: class log 2_2004.log

log type: text

opened on: 27 May 2004, 11:10:05

. *Now I want to show you a few nice things you can do with the table command.

.

. set mem 50m

Current memory allocation

current memory usage

settable value description (1M = 1024k)

--------------------------------------------------------------------

set maxvar 5000 max. variables allowed 1.733M

set memory 50M max. data space 50.000M

set matsize 400 max. RHS vars in models 1.254M

-----------

52.987M

. use cps_y2k_numeric.dta, clear

. *If you want to compare earnings across racial groups, you could do something like this:

. sort new_race

. by new_race: summarize ernval2 if age>19& age<30 [fweight=wgt2]

_______________________________________________________________________________

-> new_race = Non Hispanic White

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

ernval2 | 23761617 17402.6 18692.48 0 229339

_______________________________________________________________________________

-> new_race = Non Hispanic Black

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

ernval2 | 5035092 12706.78 14859.38 0 257525

_______________________________________________________________________________

-> new_race = NH American Indian

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

ernval2 | 322367 11999 15137.37 0 205817

_______________________________________________________________________________

-> new_race = NH Asian

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

ernval2 | 1744104 16716.66 20424.79 0 229339

_______________________________________________________________________________

-> new_race = Hispanic

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

ernval2 | 5494946 13050.39 14590.44 0 333564

. *This gives you the weighted observations, and the weighted average and standard deviation of earnings by new_race

.

. *now look at this:

.

. table new_race if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq)

-------------------------------------------------

race and Hispanic |

combined | mean(ernval2) Freq.

-------------------+-----------------------------

Non Hispanic White | 17402.6 2.38e+07

Non Hispanic Black | 12706.78 5035092

NH American Indian | 11999 322,367

NH Asian | 16716.66 1744104

Hispanic | 13050.39 5494946

-------------------------------------------------

. *Table gives you the same answer more concisely.

.

. *Now let's suppose you want to look at income by race and gender:

.

. *you could sort race sex, and then by race sex: summarize ernval2, but table makes it easier:

.

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq)

---------------------------------------

race and Hispanic | p20

combined | male female

-------------------+-------------------

Non Hispanic White | 20692.28 14142.71

| 1.18e+07 1.19e+07

|

Non Hispanic Black | 14393.64 11355.78

| 2239211 2795881

|

NH American Indian | 14200.51 10289.88

| 140,889 181,478

|

NH Asian | 20740.33 12991.02

| 838,514 905,590

|

Hispanic | 16493.41 9512.212

| 2784925 2710021

---------------------------------------

. *Notice how in each cell (that is each combination of race and sex), we have the weighted average earnings and the weighted count of observations.

. *You can run the same command again without the weights to get the unweighted counts, or you can try this:

.

. gen byte ones=1

. *This just creates a variable that's one for everybody,which we can use to get unweighted sums from table..

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones)

---------------------------------------

race and Hispanic | p20

combined | male female

-------------------+-------------------

Non Hispanic White | 20692.28 14142.71

| 1.18e+07 1.19e+07

| 4897 5213

|

Non Hispanic Black | 14393.64 11355.78

| 2239211 2795881

| 680 946

|

NH American Indian | 14200.51 10289.88

| 140,889 181,478

| 103 113

|

NH Asian | 20740.33 12991.02

| 838,514 905,590

| 345 379

|

Hispanic | 16493.41 9512.212

| 2784925 2710021

| 1190 1144

---------------------------------------

. *The freq command takes weights into account (and gives us weighted frequency here) but the rawsum command ignores the weights).

. *now let's create a third variable:

.

. *A familiar one, nativity

. tabulate citizen

citizenship p733 | Freq. Percent Cum.

----------------------------------+-----------------------------------

native born in US | 116,220 86.92 86.92

native, born in territories | 1,090 0.82 87.73

native, born abroad of US parents | 976 0.73 88.46

foreign born, naturalized | 5,348 4.00 92.46

foreign born, non US citizen | 10,076 7.54 100.00

----------------------------------+-----------------------------------

Total | 133,710 100.00

. tabulate citizen, nolab

citizenship |

p733 | Freq. Percent Cum.

------------+-----------------------------------

1 | 116,220 86.92 86.92

2 | 1,090 0.82 87.73

3 | 976 0.73 88.46

4 | 5,348 4.00 92.46

5 | 10,076 7.54 100.00

------------+-----------------------------------

Total | 133,710 100.00

. gen byte immigrant=0

. replace immigrant=1 if citizen>3

(15424 real changes made)

. label define 0 "US born" 1 "immigrant"

0 invalid name

r(198);

. label define imm_lbl 0 "US born" 1 "immigrant"

. label val immigrant imm_lbl

. *Now let's look at a 3 way table

. table new_race sex immigrant if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones)

-------------------------------------------------------------

| immigrant and p20

race and Hispanic | ----- US born ---- ---- immigrant ---

combined | male female male female

-------------------+-----------------------------------------

Non Hispanic White | 20732.47 14231.42 19613.95 11255.55

| 1.14e+07 1.16e+07 424,922 355,745

| 4729 5072 168 141

|

Non Hispanic Black | 14760.95 11399.67 9874.854 10493.13

| 2070880 2660509 168,331 135,372

| 622 900 58 46

|

NH American Indian | 14201.4 10449.44 14178.08 6290.192

| 135,516 174,516 5,373 6,962

| 100 111 3 2

|

NH Asian | 19099.27 13189.94 21342.07 12907.51

| 224,973 267,765 613,541 637,825

| 108 128 237 251

|

Hispanic | 18517.5 11292.68 14793.59 7468.546

| 1271208 1448270 1513717 1261751

| 546 617 644 527

-------------------------------------------------------------

. *So here we have mean income (taking the weights into account), weighted count and unweighted count for people in their 20s by race, sex and immigrant status. Table can do a lot of work and report results very compactly

. *We could also have done:

. table new_race sex if age>19 & age<30 [fweight=wgt2], contents (mean ernval2 freq rawsum ones) by(immigrant)

---------------------------------------

immigrant and race |

and Hispanic | p20

combined | male female

-------------------+-------------------

US born |

Non Hispanic White | 20732.47 14231.42

| 1.14e+07 1.16e+07

| 4729 5072

|

Non Hispanic Black | 14760.95 11399.67

| 2070880 2660509

| 622 900

|

NH American Indian | 14201.4 10449.44

| 135,516 174,516

| 100 111

|

NH Asian | 19099.27 13189.94

| 224,973 267,765

| 108 128

|

Hispanic | 18517.5 11292.68

| 1271208 1448270

| 546 617

-------------------+-------------------

immigrant |

Non Hispanic White | 19613.95 11255.55

| 424,922 355,745

| 168 141

|

Non Hispanic Black | 9874.854 10493.13

| 168,331 135,372

| 58 46

|

NH American Indian | 14178.08 6290.192

| 5,373 6,962

| 3 2

|

NH Asian | 21342.07 12907.51

| 613,541 637,825

| 237 251

|

Hispanic | 14793.59 7468.546

| 1513717 1261751

| 644 527

---------------------------------------

. exit, clear

*At this point I could have saved the data file, because I've added several new variables.