/***Stata code for Todd E. Elder, John H. Goddeeris, and Steven J.
Haider, "Isolating the Roles of Individual Covariates in
Reweighting Estimation", Journal of Applied Econometrics.

The data used in the article are from the 1 percent PUMS file of the
2000 Census data distributed by IPUMS USA (Ruggles et al. (2010)).  We
analyze the black-white wage gap among males aged 25-59 employed in
the civilian labor force.  Given the large sample sizes available, we
use a 10% random sample and exclude observations with missing data on
any of the five covariates (education, potential experience, marital
status, and occupation).  We also exclude observations with hourly
wages below $1 and above $3000.  Our final analysis sample consists of
213,908 observations for whites and 23,945 observations for blacks.  

We have used the following IPUMS variables, as described on the IPUMS
website (https://usa.ipums.org/) 

incwage         annual wage and salary income
uhrswork	usual hours of work
wkswork1	weeks worked last year
educd		educational attainment
age		respondent age
marst		categorical marital status
occ		3-digit numeric variable representing the person's primary
                occupation
ind		3-digit numeric variable representing the industry of the
                person's occ
hispand 	classifies persons of Hispanic origin
raced		classifies person's race

Below includes Stata code to recode the raw Ruggles et al. (2010)
variables into the variables that we use.

***/

set memory 30m

*******************************************************
***Data setup*******************************************
*******************************************************

***First, infile raw comma-delimited data
insheet age marst raced hispand educd occ ind wkswork1 uhrswork incwage using egh-data.txt, comma

label var age "Age"
label var marst   "Marital status"
label var raced   "Race [detailed version]"
label var hispand   "Hispanic origin [detailed version]"
label var  educd   "Educational attainment [detailed version]"
label var  occ   "Occupation"
label var   ind   "Industry"
label var   wkswork1   "Weeks worked last year"
label var   uhrswork   "Usual hours worked per week"
label var   incwage   "Wage and salary income"

***Create log wage variable
gen wage=incwage/(uhrswork*wkswork1)
gen lwage=log(wage)
drop if lwage==.
drop if lwage<0 | lwage > 8

*** NOTE: going by actual grade completed
gen     educat=1 if educd>0 & educd<60
replace educat=2 if educd>=60 & educd<65
replace educat=3 if educd>=65 & educd<100
replace educat=4 if educd>=100 & educd<120
tab educat, gen(eddum)

gen     agecat=1 if age<20
replace agecat=2 if age>=20 & age<25
replace agecat=3 if age>=25 & age<30
replace agecat=4 if age>=30 & age<35
replace agecat=5 if age>=35 & age<40
replace agecat=6 if age>=40 & age<45
replace agecat=7 if age>=45 & age<50
replace agecat=8 if age>=50 & age<55
replace agecat=9 if age>=55 & age<60
tab agecat, gen(agedum)

*** NOTE: Create potential experience using Angrist and Chen's method to impute actual schooling 
*No schooling 
gen exp = age - 6 - 0 if educd ==2  
*Nursery - 4th grade
replace exp = age - 6 - 2.5 if educd==10
*5th-6th grade
replace exp = age - 6 - 5.5 if educd==21
*7th-8th grade
replace exp = age - 6 - 7.5 if educd==24
*9th grade
replace exp = age - 6 - 9 if educd==30 
*10th grade
replace exp = age - 6 - 10 if educd==40 
*11th grade
replace exp = age - 6 - 11 if educd==50
*12th grade no diploma
replace exp = age - 6 - 11.38 if educd==61
*HS graduate
replace exp = age - 6 - 12 if educd==62
*<1 year college
replace exp = age - 6 - 12.55 if educd==65 
*1+ year college, no degree
replace exp = age - 6 - 13.35 if educd==71
*Associate degree
replace exp = age - 6 - 14 if educd==81
*Bachelors degree
replace exp = age - 6 - 16 if educd==101
*Masters, professional, or doctorate degree
replace exp = age - 6 - 18 if educd>=114 & educd<=116

gen expcat=1 if exp<10
replace expcat=2 if exp>=10 & exp<15
replace expcat=3 if exp>=15 & exp<20
replace expcat=4 if exp>=20 & exp<25
replace expcat=5 if exp>=25 & exp<30
replace expcat=6 if exp>=30 & exp<35
replace expcat=7 if exp>=35 & exp<100
tab expcat, gen(expdum)

***Marital status
gen married=(marst==1 | marst==2 | marst==3)
tab married

*** OCCUPATION categories
* management/professional
gen     occcat=1 if occ>=1   & occ<360 
* service
replace occcat=2 if occ>=360 & occ<470
*Sales / office
replace occcat=3 if occ>=470 & occ<600
* Farming / fishery
replace occcat=4 if occ>=600 & occ<620
*Construction and mining
replace occcat=5 if occ>=620 & occ<770
*Production and transportation
replace occcat=6 if occ>=770 & occ<980

tab occcat, gen(occdum)

***INDUSTRY categories
*Ag, mining, construction, and utilities
gen     indcat=1 if ind>=1   & ind<100
*Manufacturing
replace indcat=2 if ind>=100 & ind<400
*Wholesale and retail trade
replace indcat=3 if ind>=400 & ind<600
*Transportation
replace indcat=4 if ind>=600 & ind<640
*Information and Communications, FIRE
replace indcat=5 if ind>=640 & ind<720  
** Professional/Scientific/Management
replace indcat=6 if ind>=720 & ind<780
** Education/Health and Social Services, Arts, entertainment, recreation, accommodations, food services, other services
replace indcat=7 if (ind>=780 & ind<850) | (ind>=930 & ind<960) 
*Public Administration
replace indcat=8 if ind>=850 & ind<930

tab indcat, gen(inddum)

** RACE categories
* NOTE: We ignore lots of race codes 
gen     maingrp=3 if hispand>=100 & hispand<200 
replace maingrp=4 if hispand==200
replace maingrp=1 if raced==100
replace maingrp=2 if raced==200 | raced==210
replace maingrp=6 if raced>=300 & raced<=400
replace maingrp=5 if raced>=400 & raced<680
tab maingrp

by maingrp, sort: sum lwage wage

***Keep non-Hispanic whites and blacks
keep if maingrp==1 | maingrp==2 
*Blacks
replace maingrp=0 if maingrp==2


***Drop if any values of the covariates are missing - the code is not set up to handle missing values well
global mainreg "eddum2 eddum3 eddum4 expdum2 expdum3 expdum4 expdum5 expdum6 expdum7 married occdum2 occdum3 occdum4 occdum5 occdum6 inddum2 inddum3 inddum4 inddum5 inddum6 inddum7 inddum8"
foreach var in $mainreg { 
   drop if `var'>=.
   }

