************************************ ************************************ ***SOCI 600: INTRODUCTION TO SOCIOLOGICAL DATA ANALYSIS ***HYPOTHESIS TESTING: ONE- AND TWO-SAMPLE CASES ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Shortcut for folders global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata05.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Open 2019 ACS (only Texas) use "$data\ACS2019.dta", clear ***Complex survey design svyset cluster [pweight=perwt], strata(strata) singleunit(scaled) ************************************ ***GENERATE VARIABLES ************************************ ***Sex gen female=. replace female=0 if sex==1 // Male replace female=1 if sex==2 // Female label define female 0 "Male" 1 "Female" label values female female ***Race/ethnicity gen raceth=. replace raceth=1 if race==1 & hispan==0 // White replace raceth=2 if race==2 & hispan==0 // Black replace raceth=3 if hispan>=1 & hispan<=4 // Hispanic replace raceth=4 if (race==4 | race==5 | race==6) & hispan==0 // Asian replace raceth=5 if race==3 & hispan==0 // Native American replace raceth=6 if (race==7 | race==8 | race==9) & hispan==0 // Other label define raceth 1 "White" 2 "African American" 3 "Hispanic" /// 4 "Asian" 5 "Native American" 6 "Ohter races" label values raceth raceth ***Age egen agegr = cut(age), at(0,16,20,25,35,45,55,65,100) label define agecode 0 "0-15" 16 "16-19" 20 "20-24" 25 "25-34" /// 35 "35-44" 45 "45-54" 55 "55-64" 65 "65-100" label values agegr agegr ***Educational attainment gen educgr=. replace educgr=1 if educ>=0 & educ<=5 // Less than high school replace educgr=2 if educ==6 // High school replace educgr=3 if educ==7 | educ==8 // Some college replace educgr=4 if educ==10 // College replace educgr=5 if educ==11 // 5+ years of college, graduate school label define educgr 1 "Less than high school" 2 "High school" /// 3 "Some college" 4 "College" 5 "Graduate school" label values educgr educgr ***Marital status gen marital=. replace marital=1 if marst==1 | marst==2 // Married replace marital=2 if marst>=3 & marst<=5 // Separated, divorced, widowed replace marital=3 if marst==6 // Never married, single label define marital 1 "Married" 2 "Separated, divorced, widowed" 3 "Never married" label values marital marital ***Migration status gen migrant=. replace migrant=1 if migrate1d==10 | migrate1d==23 // same house or within PUMA replace migrant=2 if migrate1d>=24 & migrate1d<=32 // internal migrant replace migrant=3 if migrate1d==40 // international migrant label define migrant 1 "Non-migrant" 2 "Internal migrant" 3 "International migrant" label values migrant migrant ***Wage and salary income gen income=. replace income=incwage if incwage!=999999 ************************************ ***ONE-SAMPLE Z-TEST AND t-TEST ************************************ ***Personal income of Veterans in Texas (ACS) ***compared to US population 15+ (U.S. Census Bureau) ***https://fred.stlouisfed.org/series/MAPAINUSA646N# ***Mean personal income of US population in 2019 (U.S. Census Bureau) = 54,129 ***Generate dummy variable for Veterans tab vetstat, m gen veteran=. replace veteran=1 if vetstat==2 // Veteran replace veteran=0 if vetstat==1 // Not a Veteran tab vetstat veteran, m ***Mean wage and salary income of Veterans in Texas in 2019 (ACS) svy, subpop(if veteran==1): mean income mean income if veteran==1 ***Is the mean income of Veterans in Texas ($30,511.68) significantly lower ***than mean income of US population 15+ ($54,129)? ***Z-test (it does not allow the use of weight or complex survey design) ztest income=54129 if veteran==1 ***t-test (it does not allow the use of weight or complex survey design) ttest income=54129 if veteran==1 ************************************ ***ONE-SAMPLE TEST OF PROPORTIONS ************************************ ***Gender composition of sample of adult population (18+) in Texas (ACS) ***compared to US population ***https://www.census.gov/quickfacts/fact/table/US/PST045221 (U.S. Census Bureau) ***https://data.worldbank.org/indicator/SP.POP.TOTL.FE.ZS?locations=US (World Bank) ***Percentage of women in the US population in 2019 (U.S. Census Bureau) = 50.5% ***Percentage of 18+ women in Texas in 2019 (ACS) tab female [fweight=perwt] if age>=18 & age<=92 tab female if age>=18 & age<=92 ***Is the percentage of women 18+ in Texas (51.35%) significantly higher ***than the percentage of women in the total population (50.5%)? ***Proportion test (it does not allow the use of weight or complex survey design) prtest female=.505 ************************************ ***TWO-SAMPLE t-TEST ************************************ ***Mean of wage and salary income by sex tabstat income if income!=0 [fweight=perwt], by(sex) ***t-test of personal income by sex ***Weights are not allowed tabstat income if income!=0, by(sex) ttest income if income!=0, by(sex) ************************************ ***TWO-SAMPLE TEST OF PROPORTIONS ************************************ ***Migration status by sex ***Independent variable (sex): column ***Dependent variable (migrant): row tab migrant sex [fweight=perwt] ***Percentage distribution is estimated ***within categories of independent variable (sex) ***Add up to 100% within each sex category tab migrant sex [fweight=perwt], col nofreq ***Proportion is tested for a dummy dependent variable ***by categories of a dummy independent variable (sex) ***Need to create variable only for internal migrant vs. non-migrant ***as well as only for international migrant vs. non-migrant ************************************ ***Internal migration ************************************ ***Dummy variable for internal migration (domestic migration) gen dommig=. replace dommig=0 if migrant==1 // non-migrant replace dommig=1 if migrant==2 // internal migrant label define dommig 0 "Non-migrant" 1 "Internal migrant" label values dommig dommig tab migrant dommig, m ***Proportion of internal migrants by sex tab dommig sex [fweight=perwt], col nofreq tabstat dommig [fweight=perwt], by(sex) ***Sample size for internal migration test count if dommig!=. & sex!=. ***Test of proportions of internal migrants by sex ***Weights are not allowed tab dommig sex, col nofreq prtest dommig, by(sex) ************************************ ***International migration ************************************ ***Dummy variable for international migration gen intmig=. replace intmig=0 if migrant==1 // non-migrant replace intmig=1 if migrant==3 // international migrant label define intmig 0 "Non-migrant" 1 "International migrant" label values intmig intmig tab migrant intmig, m ***Proportion of international migrants by sex tab intmig sex [fweight=perwt], col nofreq tabstat intmig [fweight=perwt], by(sex) ***Sample size for international migration test count if intmig!=. & sex!=. ***Test of proportions of international migrants by sex ***Weights are not allowed tab intmig sex, col nofreq prtest intmig, by(sex) ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata05.dta", replace ***Save log log close