HERE IS THE BASIC DRILL FOR MATCHING HUSBANDS TO WIVES. FIRST, MY DATASET CONTAINED ONLY MARRIED (SPOUSE PRESENT) INDIVIDUALS, SO WE EXPECT EVERYONE TO HAVE A SPOUSE MATCH (THIS IS DIFFERENT FROM THE HOUSEHOLDER-PARTNER MATCH DESCRIBED BELOW, WHERE MOST HOUSEHOLDERS DO NOT HAVE A PARTNER.
PROCEDURE IS THIS:
1) DOWLOAD THE DATA AND SAVE INDIVUAL LEVEL DATASET OF MARRIED PERSONS
2) KEEP ONE SEX, RENAME ALL THE VARIABLES APPROPRIATELY, AND SORT ON A UNIQUE COUPLE ID
3) OPEN THE INDIVIDUALS DATASET, KEEP THE OTHER SEX, RENAME THE VARIABLES APPROPRIATELY, THEN SORT ON UNIQUE COUPLE ID
4) MERGE THE TWO DATASETS TOGETHER ON THE UNIQUE COUPLE ID
5) CHECK RESULTS
-------------------------------------------------------------------------------
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\New stata
> files\20th cent compare ed and race intermar\an ed endogamy 1pct redo checke
> r for SF.log
log type: text
opened on: 5 Sep 2007, 23:19:35
. cd "F:\AAA Miker Data folder\1940-2000 1% married with race and ed"
F:\AAA Miker Data folder\1940-2000 1% married with race and ed
. do "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\mrosenfe_s
> tanford_edu_092.do"
. /* Important: you need to put the .dat and .do files in one folder/
> directory and then set the working folder to that folder. */
.
. set more off
.
. clear
. infix ///
> byte year 1-2 ///
> double serial 3-10 ///
> int hhwt 11-14 ///
> byte statefip 15-16 ///
> byte gq 17 ///
> byte pernum 18-19 ///
> byte sploc 20-21 ///
> int age 22-24 ///
> byte sex 25 ///
> byte marst 26 ///
> int race 27 ///
> long bpl 28-30 ///
> int hispan 31 ///
> byte educrec 32 ///
> int higrade 33-34 ///
> byte educ99 35-36 ///
> using mrosenfe_stanford_edu_092.dat
(5495702 observations read)
.
. tabulate year
Census year | Freq. Percent Cum.
------------+-----------------------------------
2000 | 1,152,040 20.96 20.96
1940 | 570,800 10.39 31.35
1960 | 809,206 14.72 46.07
1970 | 886,714 16.13 62.21
1980 | 993,668 18.08 80.29
1990 | 1,083,274 19.71 100.00
------------+-----------------------------------
Total | 5,495,702 100.00
. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta"
file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta saved
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 2,747,851 50.00 50.00
Female | 2,747,851 50.00 100.00
------------+-----------------------------------
Total | 5,495,702 100.00
. tabulate sex, nolab
Sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 2,747,851 50.00 50.00
2 | 2,747,851 50.00 100.00
------------+-----------------------------------
Total | 5,495,702 100.00
. keep if sex==1
(2747851 observations deleted)
. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta"
file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta saved
. rename age mage
. rename race mrace
. rename
hispan mhispan
. rename bpl
mbpl
. rename educrec
varname required
r(100);
. rename
educrec meducrec
. rename
higrade mhigrade
. rename educ99 meduc99
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 2,747,851 100.00 100.00
------------+-----------------------------------
Total | 2,747,851 100.00
. drop sex
. tabulate year, nolab
Census year | Freq. Percent Cum.
------------+-----------------------------------
0 | 576,020 20.96 20.96
94 | 285,400 10.39 31.35
96 | 404,603 14.72 46.07
97 | 443,357 16.13 62.21
98 | 496,834 18.08 80.29
99 | 541,637 19.71 100.00
------------+-----------------------------------
Total | 2,747,851 100.00
. gen str14 cupid=string(year)+string(statefip)+string(serial)+string(pernum)
/*NOTE THAT APPLIES TO ALL COUPLE-ID GENERATING
STATEMENTS: IN NEWER VERSIONS OF STATA, OR WITH LARGE DATASETS AND
CORRESPONDINGLY LARGER VALUES OF SERIAL, YOU HAVE TO BE CAREFUL LEST STATA USE
A NON-FIXED-WIDTH VERSION OF THE NUMBER TO MAKE A STRING FROM, AND THEN LEAVE
YOU WITH DUPLICATE COUPLE IDENTIFIERS BECAUSE IS 251 AND SPLOC IS 1, VERSUS
SERIAL OF 25 AND SPLOC OF 11, WHICH WOULD BOTH COMBINE TO 2511. WHAT YOU NEED
TO DO IN RECENT VERSIONS OF STATA IS INSIST UPON FIXED LENGTH FORMATS FOR THE
NUMBERS, WITH LEFT JUSTIFIED NUMBERS SO THAT TRAILING SPACES GET LEFT IN THE
STRING. FOR INSTANCE:
gen str17
cupid=string(year, "%4.0f")+string(datanum,
"%1.0f")+string(statefip, "%-2.0f")+string(serial,
"%-8.0f")+string(pernum, "%-2.0f")
I REALIZE THAT THIS ABOVE SYNTAX IS MORE UNWIELDY, BUT IF
THE SIMPLER SYNTAX RESULTS IN ANY DUPLICATE IDS, YOU HAVE TO RESORT TO THE MORE
COMPLICATED SYNTAX
*/
. drop sploc
. sort cupid
. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta", replace
file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta saved
. clear all
. use "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta", clear
. keep if sex==2
(2747851 observations deleted)
. rename age fage
. rename marst fmarst
. tabulate marst
variable marst not found
r(111);
. tabulate fmarst
Marital status | Freq. Percent Cum.
------------------------+-----------------------------------
Married, spouse present | 2,747,851 100.00 100.00
------------------------+-----------------------------------
Total | 2,747,851 100.00
. drop fmarst
. rename race frace
. rename bpl
fbpl
. rename
hispan fhispan
. rename educrec feducrec
. rename higrade fhigrade
. rename educ99 feduc99
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Female | 2,747,851 100.00 100.00
------------+-----------------------------------
Total | 2,747,851 100.00
. drop sex
. gen str14 cupid=string(year)+string(statefip)+string(serial)+string(sploc)
. rename pernum fpernum
. rename sploc fsploc
. drop year serial hhwt statefip gq
. sort cupid
. merge cupid using husbands
(label educ99lbl already defined)
(label higradelbl already defined)
(label educreclbl already defined)
(label hispanlbl already defined)
(label bpllbl already defined)
(label racelbl already defined)
(label marstlbl already defined)
(label agelbl already defined)
(label gqlbl already defined)
(label statefiplbl already defined)
(label yearlbl already defined)
. tabulate _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
3 | 2,747,851 100.00 100.00
------------+-----------------------------------
Total | 2,747,851 100.00
. drop _merge
* MERGE OF ALL _3 MEANS A PERFECT 1-TO-1 MATCH OF HUSBANDS TO WIVES, WHICH IS WHAT WE WANT.
. drop cupid
. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta"
file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta saved
. tabulate feducrec
Educational attainment |
recode | Freq. Percent Cum.
------------------------+-----------------------------------
None or preschool | 26,902 0.98 0.98
Grade 1, 2, 3, or 4 | 60,168 2.19 3.17
Grade 5, 6, 7, or 8 | 412,543 15.01 18.18
Grade 9 | 122,785 4.47 22.65
Grade 10 | 159,040 5.79 28.44
Grade 11 | 128,554 4.68 33.12
Grade 12 | 983,211 35.78 68.90
1 to 3 years of college | 497,851 18.12 87.02
4+ years of college | 356,797 12.98 100.00
------------------------+-----------------------------------
Total | 2,747,851 100.00
. tabulate feducrec, nolab miss
Educational |
attainment |
recode | Freq. Percent Cum.
------------+-----------------------------------
1 | 26,902 0.98 0.98
2 | 60,168 2.19 3.17
3 | 412,543 15.01 18.18
4 | 122,785 4.47 22.65
5 | 159,040 5.79 28.44
6 | 128,554 4.68 33.12
7 | 983,211 35.78 68.90
8 | 497,851 18.12 87.02
9 | 356,797 12.98 100.00
------------+-----------------------------------
Total | 2,747,851 100.00
. gen byte mBAplus=0
. replace mBAplus=1 if meducrec=9
invalid syntax
r(198);
. replace mBAplus=1 if meducrec==9
(492258 real changes made)
. gen byte fBAplus=0
. replace fBAplus=1 if feducrec==9
(356797 real changes made)
. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta", replace
file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta saved
. sort year
. tabulate mBAplus fBAplus if mage>19 & mage<30 & fage>19 & fage<30 & mbpl<100 & fbpl<100, ma
> tcell(matrix)
| fBAplus
mBAplus | 0 1 | Total
-----------+----------------------+----------
0 | 232,327 11,961 | 244,288
1 | 22,317 22,364 | 44,681
-----------+----------------------+----------
Total | 254,644 34,325 | 288,969
. tabulate mBAplus fBAplus if year==94 & mage>19 & mage<30 & fage>19 & fage<30 & mbpl<100 & f
> bpl<100, matcell(matrix)
| fBAplus
mBAplus | 0 1 | Total
-----------+----------------------+----------
0 | 33,840 421 | 34,261
1 | 1,267 544 | 1,811
-----------+----------------------+----------
Total | 35,107 965 | 36,072
. orse matrix
odds ratio is 34.512033
log odds ratio is 3.541308
SE of Log OR= .07093905
inverse OR (intermar OR)= .0289754
off-diag (or intermar) log OR= -3.541308
. display 33840*544/(421*1267)
34.512033
. *very close to what I had before..
* ORSE IS A LITTLE STATA PROGRAM I WROTE MYSELF TO DISPLAY THE ODDS RATIO, LOG ODDS RATIO AND STANDARD ERROR OF THE LOR FROM A 2X2 TABLE.
---------------------------------------------------------------------
WHAT FOLLOWS IS A SAMPLE LOG FOR IMPORTING CENSUS DATA AND MATCHING HEADS OF HOUSEHOLD TO PARTNERS. REMEMBER THAT EVERY PERSON IN THE HOUSEHOLD GIVES THEIR RELATIONSHIP TO THE HEAD OF THE HOUSEHOLD; ONLY PARTNERS OF THE HEAD OF HOUSEHOLD GET RECORDED. SO IN THIS PROCEDURE WE SEPARATE OUT HEADS OF HOUSEHOLD (RELATED==101) AND PARTNERS (OF THE HEAD OF HH, RELATED=1114) AND THEN MATCH THEM. THE ID FOR MATCHING HEADS OF HH TO PARTNERS IS SIMPLY YEAR+SERIAL# OR YEAR+STATE+SERIAL# TO BE MORE CERTAIN. EVERY HH CAN CONTAIN AT MOST ONE UNMARRIED PARTNER COUPLE. HEADS OF HH WITHOUT PARTNERS ARE DROPPED, AND THEN WHAT IS LEFT IS HEADS OF HH AND THEIR PARTNERS. NOTE THAT THIS DATA EXTRACTION CREATES ONLY THE HOUSEHOLDER-PARTNER COUPLES. ANOTHER, DIFFERENT DATA EXTRACTION WOULD BE REQUIRED TO MATCH HUSBANDS TO WIVES.
-----------------------------------------------------------------------------------------------------
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\New stata files\family structur
> e\5% 2000 HH and partners.log
log type: text
opened on: 2 Feb 2004, 15:30:29
. set mem 600m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.733M
set memory 600M max. data space 600.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
602.987M
. cd "C:\AAA Miker Data folder\2000 5% for hh and partner"
C:\AAA Miker Data folder\2000 5% for hh and partner
. do "C:\AAA Miker Data folder\2000 5% for hh and partner\mrose009.do"
. infix using mrose009.dct if (related==101 | related==1114)
infix dictionary using mrose009.dat {
str8 serial 1- 8
int hhwt 9- 12
byte statefip 13- 14
byte metro 15- 15
int metareag 16- 18
* pernum 19- 20
int related 21- 24
byte age 25- 27
byte sex 28- 28
byte raceg 29- 29
byte marst 30- 30
int bplg 31- 33
byte hispang 34- 34
byte educrec 35- 35
}
(5527209 observations read)
.
. /*Important: you need to put the .dat, .do, and .dct files all in one folder/directory
.
.
end of do-file
. describe
Contains data
obs: 5,527,209
vars: 13
size: 154,761,852 (75.4% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
serial str8 %9s Household serial number
hhwt int %8.0g hhwtlbl Household weight
statefip byte %57.0g statefiplbl
State (FIPS code)
metro
byte %36.0g metrolbl
Metropolitan status
metareag int %43.0g metareaglbl
Metropolitan area -- General
related int %60.0g relatedlbl
Relationship to household head
-- Detailed
age byte %8.0g agelbl Age
sex byte %8.0g sexlbl Sex
raceg byte %23.0g raceglbl Race -- General
marst byte %26.0g marstlbl Marital status
bplg int %27.0g bplglbl Birthplace -- General
hispang byte %19.0g hispanglbl
Hispanic origin -- General
educrec byte %23.0g educreclbl
Educational attainment recode
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
. save "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta"
file C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta sav
> ed
. tabulate related
Relationship to household head -- |
Detailed | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Head/Householder | 5,273,998 95.42 95.42
Unmarried Partner | 253,211 4.58 100.00
----------------------------------------+-----------------------------------
Total | 5,527,209 100.00
. set mem 400m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.733M
set memory 400M max. data space 400.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
402.987M
. use "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta",
> clear
. tabulate related
Relationship to household head -- |
Detailed | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Head/Householder | 5,273,998 95.42 95.42
Unmarried Partner | 253,211 4.58 100.00
----------------------------------------+-----------------------------------
Total | 5,527,209 100.00
. tabulate related, nolab
Relationshi |
p to |
household |
head -- |
Detailed | Freq. Percent Cum.
------------+-----------------------------------
101 | 5,273,998 95.42 95.42
1114 | 253,211 4.58 100.00
------------+-----------------------------------
Total | 5,527,209 100.00
. keep if related==1114
(5273998 observations deleted)
THESE ARE THE PARTNERS
. save "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta"
file C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta saved
. tabulate related
Relationship to household head -- |
Detailed | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Unmarried Partner | 253,211 100.00 100.00
----------------------------------------+-----------------------------------
Total | 253,211 100.00
. drop related
. rename age page
. rename sex psex
. rename raceg praceg
. rename marst pmarst
. rename hispang phispang
. rename educrec peducrec
RENAME VARIABLES TO BE SURE THAT YOU CAN IDENTIFY PARTNER'S CHARACTERISTICS
. describe
Contains data from C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta
obs: 253,211
vars: 12 3 Feb 2004 11:21
size: 6,583,486 (98.4% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
serial str8 %9s Household serial number
hhwt int %8.0g hhwtlbl Household weight
statefip byte %57.0g statefiplbl
State (FIPS code)
metro
byte %36.0g metrolbl
Metropolitan status
metareag int %43.0g metareaglbl
Metropolitan area -- General
page byte %8.0g agelbl Age
psex byte %8.0g sexlbl Sex
praceg byte %23.0g raceglbl Race -- General
pmarst byte %26.0g marstlbl Marital status
bplg int %27.0g bplglbl Birthplace -- General
phispang byte %19.0g hispanglbl
Hispanic origin -- General
peducrec byte %23.0g educreclbl
Educational attainment recode
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
. gen str10 cupid=string(statefip)+serial
MAKE THE COUPLE ID
. sort cupid
. drop serial
. rename bplg pbplg
SORT BY COUPLE ID, THEN SAVE, THEN GO BACK AND MAKE THE OTHER DATASET OF HOUSEHOLDERS
. save "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta", replace
file C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta saved
. clear all
. use "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta",
> clear
. keep if related==101
(253211 observations deleted)
RELATED==101 ARE THE HEADS OF HH
. drop hhwt metro metareag
. gen str10 cupid=string(statefip)+serial
. drop serial statefip related
. rename age hage
. rename sex hsex
. rename raceg hraceg
. rename marst hmarst
. rename bplg hbplg
. rename hispang hhispang
. rename educrec heducrec
. sort cupid
. merge cupid using "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta"
(label statefiplbl already defined)
(label metrolbl already defined)
(label metareaglbl already defined)
(label sexlbl already defined)
(label raceglbl already defined)
(label marstlbl already defined)
(label bplglbl already defined)
(label hispanglbl already defined)
(label educreclbl already defined)
. tabulate _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 5,020,787 95.20 95.20
3 | 253,211 4.80 100.00
------------+-----------------------------------
Total | 5,273,998 100.00
. keep if _merge==3
(5020787 observations deleted)
MERGE==3 ARE THE HEADS OF HOUSEHOLD WHO HAVE PARTNERS MATCHED TO THEM. ALL THE OTHER HEADS OF HH WE THROW AWAY.
. tabulate _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
3 | 253,211 100.00 100.00
------------+-----------------------------------
Total | 253,211 100.00
. drop _merge cupid
. save "C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples.dta"
file C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples.dta saved
. describe
Contains data from C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples
> .dta
obs: 253,211
vars: 18 3 Feb 2004 11:28
size: 6,583,486 (98.4% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
hage byte %8.0g agelbl Age
hsex byte %8.0g sexlbl Sex
hraceg byte %23.0g raceglbl Race -- General
hmarst byte %26.0g marstlbl Marital status
hbplg int %27.0g bplglbl Birthplace -- General
hhispang byte %19.0g hispanglbl
Hispanic origin -- General
heducrec byte %23.0g educreclbl
Educational attainment recode
hhwt int %8.0g hhwtlbl Household weight
statefip byte %57.0g statefiplbl
State (FIPS code)
metro
byte %36.0g metrolbl
Metropolitan status
metareag int %43.0g metareaglbl
Metropolitan area -- General
page byte %8.0g agelbl Age
psex byte %8.0g sexlbl Sex
praceg byte %23.0g raceglbl Race -- General
pmarst byte %26.0g marstlbl Marital status
pbplg int %27.0g bplglbl Birthplace -- General
phispang byte %19.0g hispanglbl
Hispanic origin -- General
peducrec byte %23.0g educreclbl
Educational attainment recode
-------------------------------------------------------------------------------
Sorted by:
. clear all