/*********************************** ************************************ ***INTRODUCTION ************************************ ************************************ ************************************ ***AMERICAN COMMUNITY SURVEY (ACS) DATA ************************************ Get ACS resources for this course in this link: http://www.ernestoamaral.com/docs/Stata2020a/course.zip Uncompress the file in your computer. This procedure varies across computers. Basically, it will create a folder called "course" with four sub-folders: "data", "documents", "output", "programs". Save the "course" folder in a specific location in your computer, following these suggestions for Windows and Macintosh ************************************ ***WINDOWS ************************************ Save the uncompressed "course" folder under the C:\ drive. You can see the C:\ drive under "Computer" or "My PC" in File Explorer. ************************************ ***MACINTOSH ************************************ Save the uncompressed "course" folder under Macintosh HD (Hard Drive) To show Macintosh HD on your Finder sidebar: - Open Finder - Click on "Finder" menu - Click on "Preferences..." - Click on the "Sidebar" tab - Select "Hard disks" ************************************ ***CREATE SUB-FOLDERS ************************************ Some computers might ask for your password to save files in the C:\ drive or Macintosh HD. You might have to: 1) create an empty "course" folder 2) copy the "data", "documents", "output", and "programs" folders to this new folder These steps will vary from each computer. If you are not able to follow these procedures, you can save the "course" folder in any location of your preference. ************************************ ***"COURSE" SUB-FOLDERS ************************************ "data": American Community Survey microdata "documents": questionnaires and other documents "programs": This folder will be empty. You will save Stata do-files and log-files throughout the course. "output": This folder will be empty. You will save tables and figures throughout the course. */ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***WINDOWS ************************************ ***Start saving results window log using "C:\course\programs\Stata01.log", replace text ***Shortcut for folders global data = "C:\course\data" global output = "C:\course\output" ************************************ ***MACINTOSH ************************************ ***Start saving results window log using "/course/programs/Stata01.log", replace text ***Shortcut for folders global data = "/course/data" global output = "/course/output" ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Change directory cd "$data" ***Open 2018 ACS (only Texas) use "ACS2018TX.dta", clear ************************************ ***SAMPLE SIZE ************************************ count ***Year tabulate year tab year, missing tab year, m /* ************************************ ***RELATIONAL OPERATORS ************************************ The relational operators are: > (greater than) < (less than) >= (greater than or equal) <= (less than or equal) == (equal) != (not equal) Observe that the relational operator for equality is a pair of equal signs. This convention distinguishes relational equality from the first equality to generate a variable. See example below... */ ************************************ ***SEX ************************************ tab sex tab sex, m // show missing cases tab sex, nolabel // hide label ***List names of value labels label dir ***List names and contents of sex label label list sex_lbl ***Generate dummy variable for female generate female=. replace female=0 if sex==1 // Male replace female=1 if sex==2 // Female ***Verify new variable tab sex female, m ***Create label for variable label variable female "Sex" ***Create labels for categories label define female 0 "Male" 1 "Female" ***Assign labels for categories label values female female ***Verify new variable with labels tab female tab sex female, m ************************************ ***RACE/ETHNICITY ************************************ ***Race tab race, m ***Ethnicity tab hispan, m ***List names and contents of race and ethnicity labels label list race_lbl hispan_lbl ***Generate race/ethnicity variable gen raceth=. replace raceth=1 if race==1 & hispan==0 // White replace raceth=2 if race==2 & hispan==0 // Black replace raceth=3 if hispan>=1 & hispan<=4 // Hispanic replace raceth=4 if (race==4 | race==5 | race==6) & hispan==0 // Asian replace raceth=5 if race==3 & hispan==0 // Native American replace raceth=6 if (race==7 | race==8 | race==9) & hispan==0 // Other ***Create label for variable label variable raceth "Race/ethnicity" ***Create labels for categories label define raceth 1 "White" 2 "African American" 3 "Hispanic" /// 4 "Asian" 5 "Native American" 6 "Ohter races" ***Assign labels for categories label values raceth raceth ***Verify new variable tab raceth, m tab race raceth, m tab hispan raceth, m ************************************ ***AGE ************************************ sum age, d ***Generate age group variable - manually gen agegr1=. replace agegr1=0 if age>=0 & age<=15 replace agegr1=16 if age>=16 & age<=19 replace agegr1=20 if age>=20 & age<=24 replace agegr1=25 if age>=25 & age<=34 replace agegr1=35 if age>=35 & age<=44 replace agegr1=45 if age>=45 & age<=54 replace agegr1=55 if age>=55 & age<=64 replace agegr1=65 if age>=65 & age<=100 ***Verify new variable tab agegr1, m table agegr1, contents(min age max age count age) ***Generate age group variable - automatically egen agegr2 = cut(age), at(0,16,20,25,35,45,55,65,100) ***Verify new variable tab agegr2, m table agegr2, contents(min age max age count age) ***Create label for variables label variable agegr1 "Age group" label variable agegr2 "Age group" ***Create labels for categories label define agecode 0 "0-15" 16 "16-19" 20 "20-24" 25 "25-34" /// 35 "35-44" 45 "45-54" 55 "55-64" 65 "65-100" ***Assign labels for categories label values agegr1 agegr2 agecode ***Verify new variables tab agegr1, m tab age agegr1, m tab agegr2, m tab age agegr2, m ************************************ ***EDUCATIONAL ATTAINMENT ************************************ tab educ, m ***List names and contents of education label label list educ_lbl ***Generate new educational attainment variable gen educgr=. replace educgr=1 if educ>=0 & educ<=5 // Less than high school replace educgr=2 if educ==6 // High school replace educgr=3 if educ==7 | educ==8 // Some college replace educgr=4 if educ==10 // College replace educgr=5 if educ==11 // 5+ years of college, graduate school ***Create label for variable label variable educgr "Educational attainment" ***Create labels for categories label define educgr 1 "Less than high school" 2 "High school" /// 3 "Some college" 4 "College" 5 "Graduate school" ***Assign labels for categories label values educgr educgr ***Verify new variable tab educgr, m tab educ educgr, m ********************** ***MARITAL STATUS ********************** tab marst, m ***List names and contents of marital status label label list marst_lbl ***Generate new marital status variable gen marital=. replace marital=1 if marst==1 | marst==2 // Married replace marital=2 if marst>=3 & marst<=5 // Separated, divorced, widowed replace marital=3 if marst==6 // Never married, single ***Create label for variable label variable marital "Marital status" ***Create labels for categories label define marital 1 "Married" 2 "Separated, divorced, widowed" 3 "Never married" ***Assign labels for categories label values marital marital ***Verify new variable tab marital, m tab marst marital, m ************************************ ***MIGRATION STATUS (detailed version, 7 categories) ************************************ tab migrate1d, m tab migrate1d, m nolabel ***List names and contents of migration status label label list migrate1d_lbl ***Who are not applicable (N/A)? tab age if migrate1d==0 ***Generate new migration status variable gen migrant=. replace migrant=1 if migrate1d==10 | migrate1d==23 // same house or within PUMA replace migrant=2 if migrate1d>=24 & migrate1d<=32 // internal migrant replace migrant=3 if migrate1d==40 // international migrant ***Create label for variable label variable migrant "Migration status" ***Create labels for categories label define migrant 1 "Non-migrant" 2 "Internal migrant" 3 "International migrant" ***Assign labels for categories label values migrant migrant ***Verify new variable tab migrant, m tab migrate1d migrant, m ************************************ ***WAGE AND SALARY INCOME ************************************ sum incwage, d ***Generate new income variable gen income=. replace income=incwage if incwage!=999999 ***Create label for variable label variable income "Wage and salary income" ***Verify number of missing cases codebook income count if income==. ***Verify new variable sum income, d hist income, percent ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "Stata01.dta", replace ***Save log log close