*********************************** ************************************ ***SOCI 420: ADVANCED METHODS OF SOCIAL RESEARCH ***BASIC DESCRIPTIVE STATISTICS (chapter 2) ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Shortcut for folders global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata02.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Open 2021 GSS use "$data\GSS2021.dta", clear ************************************ ***WEIGHT VARIABLE ************************************ sum wtssnrps, d // summary statistics hist wtssnrps, ylabel(0(5)25) percent // histogram ************************************ ***USING WEIGHT ************************************ ***No weight tab sex, m ***Fweight does not work with a continuous weight variable tab sex [fweight=wtssnrps], m ***Iweight maintains sample size in GSS tab sex [iweight=wtssnrps], m ***Aweight maintains sample size tab sex [aweight=wtssnrps], m ************************************ ***COMPLEX SURVEY DESIGN ************************************ ***First-stage unit ***VSTRAT: Variance Stratum ***National Frame Areas (NFAs): one or more counties ***Second-stage unit ***VPSU: Variance Primary Sampling Unit ***Segments: block, group of blocks, or census tract ***singleunit(scaled) ***The scaling factor comes from using ***the average of the variances from the strata with multiple sampling units ***for each stratum with one sampling unit svyset [weight=wtssnrps], strata(vstrat) psu(vpsu) singleunit(scaled) /* ************************************ ***RELATIONAL OPERATORS ************************************ The relational operators are: > (greater than) < (less than) >= (greater than or equal) <= (less than or equal) == (equal) != (not equal) Observe that the relational operator for equality is a pair of equal signs. This convention distinguishes relational equality from the first equality to generate a variable. See example below... */ ************************************ ***SEX ************************************ tab sex tab sex, m tab sex [aweight=wtssnrps], m tab sex [aweight=wtssnrps] svy: tab sex ***Generate dummy variable for female tab sex, nolabel generate female=. replace female=0 if sex==1 replace female=1 if sex==2 tab sex female, m tab female tab female, m svy: tab female ***Create label for variable label variable female "Sex" ***Create labels for categories label define female 0 "Male" 1 "Female" ***Assign labels for categories label values female female ***Generate dummy variable for male gen male=!female tab male female, m tab male ************************************ ***RACE/ETHNICITY ************************************ ***Race tab race tab race, m tab race [aweight=wtssnrps], m tab race [aweight=wtssnrps] svy: tab race ***Hispanic tab hispanic tab hispanic, m tab hispanic, m nolabel tab hispanic [aweight=wtssnrps] svy: tab hispanic ***Generate dummy variable for hispanic gen hisp=. replace hisp=0 if hispanic==1 replace hisp=1 if hispanic>=2 & hispanic<=50 tab hispanic hisp, m ***Generate race/ethnicity variable gen raceeth=. replace raceeth=1 if race==1 & hisp==0 //non-hispanic white replace raceeth=2 if race==2 & hisp==0 //non-hispanic black replace raceeth=3 if hisp==1 //hispanic replace raceeth=4 if race==3 & hisp==0 //other tab raceeth race, m tab raceeth hisp, m ***Create label for variable label variable raceeth "Race/Ethnicity" ***Create labels for categories label define racecode 1 "Non-hispanic white" /// 2 "Non-hispanic black" /// 3 "Hispanic" /// 4 "Other" ***Assign labels for categories of a specific variable label values raceeth racecode ***New race/ethnicity variable tab raceeth, m tab raceeth [aweight=wtssnrps], m tab raceeth [aweight=wtssnrps] svy: tab raceeth ************************************ ***AGE ************************************ ***Age distribution tab age tab age, m tab age [aweight=wtssnrps], m tab age, m nolabel tab age [aweight=wtssnrps], m nolabel tab age [aweight=wtssnrps] svy: tab age ***Generate age group variable - manually gen agegr1=. replace agegr1=18 if age>=18 & age<=24 replace agegr1=25 if age>=25 & age<=44 replace agegr1=45 if age>=45 & age<=64 replace agegr1=65 if age>=65 & age<=89 tab agegr1, m table agegr1, stat(min age) stat(max age) stat(count age) ***Generate age group variable - automatically egen agegr2 = cut(age), at(18,25,45,65,90) tab agegr2, m table agegr2, stat(min age) stat(max age) stat(count age) ***Create label for variables label variable agegr1 "Age group" label variable agegr2 "Age group" ***Create labels for categories label define agecode 18 "18-24" /// 25 "25-44" /// 45 "45-64" /// 65 "65-89" ***Assign labels for categories of specific variables label values agegr1 agegr2 agecode ***New age group variables tab agegr1, m tab agegr1 [aweight=wtssnrps], m tab agegr1 [aweight=wtssnrps] svy: tab agegr1 tab agegr2, m tab agegr2 [aweight=wtssnrps], m tab agegr2 [aweight=wtssnrps] svy: tab agegr2 ************************************ ***RELIGION ************************************ ***Religious preference tab relig tab relig, m tab relig [aweight=wtssnrps], m tab relig [aweight=wtssnrps] svy: tab relig ***Generate religious variable with fewer categories tab relig, nolabel gen religion=. replace religion=1 if relig==1 //protestant replace religion=2 if relig==2 //catholic replace religion=3 if relig==3 //jewish replace religion=4 if relig>=5 & relig<=13 //other replace religion=5 if relig==4 //none tab relig religion, m tab religion, m ***Create label for variable label variable religion "Religious group" ***Create labels for categories label define relcode 1 "Protestant" /// 2 "Catholic" /// 3 "Jewish" /// 4 "Other" /// 5 "None" ***Assign labels for categories of a specific variable label values religion relcode ***New religion variable tab religion tab religion [aweight=wtssnrps] svy: tab religion ************************************ ***INCOME ************************************ ***Family income distribution tab income16 tab income16 [aweight=wtssnrps] tab income16, nolabel tab income16 [aweight=wtssnrps], nolabel svy: tab income16 tab income16, m ***Respondents' income distribution tab rincom16 tab rincom16 [aweight=wtssnrps] tab rincom16, nolabel tab rincom16 [aweight=wtssnrps], nolabel svy: tab rincom16 tab rincom16, m ***Respondents' income in constant dollars ***Inflation-adjusted personal income ***No weight sum conrinc, d mean conrinc estat sd ***Weight ***It corrects the mean sum conrinc [aweight=wtssnrps], d mean conrinc [aweight=wtssnrps] ***Survey design ***It corrects the mean and standard error svy: mean conrinc estat sd ************************************ ***PIE GRAPH - RELIGION ************************************ ***Pie graph of religion graph pie [aweight=wtssnrps], over(religion) ***Pie graph of religion with values for each pie slice graph pie [aweight=wtssnrps], over(religion) plabel(_all percent, format(%9.1fc) placement(center) size(large)) ***Save graph graph export "$output\religion_pie.png", replace ************************************ ***COLUMN GRAPH - RELIGION ************************************ ***Column graph of religion graph bar [aweight=wtssnrps], over(religion) ytitle("Percent") ***Column graph of religion with values above columns graph bar [aweight=wtssnrps], over(religion) ytitle("Percent") blabel(total, format(%9.1fc) position(outside) size(small)) ***Save graph graph export "$output\religion_column.png", replace ************************************ ***HISTOGRAMS - AGE & INCOME ***Weights are not allowed with histograms ************************************ ***Histogram of age hist age, frequency discrete xlabel(0(10)90) xtitle("Age") ***Save graph graph export "$output\age_histogram.png", replace ***Histogram of age by sex hist age, frequency discrete by(female) xlabel(0(10)90) xtitle("Age") ***Overlaying histograms of age by sex twoway (histogram age if female==0, frequency discrete xlabel(0(10)90) fcolor(gs11) lcolor(gs11)) /// (histogram age if female==1, frequency discrete xlabel(0(10)90) fcolor(none) lcolor(black) /// legend(order(1 "Males" 2 "Females")) /// xtitle("Age")) ***Save graph graph export "$output\age-sex_histogram.png", replace ***Histogram of income hist conrinc, frequency xlabel(0(50000)200000) xtitle("Respondent income in constant dollars") ***Save graph graph export "$output\income_histogram.png", replace ************************************ ***BOXPLOT - INCOME ************************************ ***Vertical boxplot of income graph box conrinc [aweight=wtssnrps], ytitle(Respondents' income) ***Horizontal boxplot of income graph hbox conrinc [aweight=wtssnrps], ytitle(Respondents' income) ***Save graph graph export "$output\income_boxplot.png", replace ************************************ ***SCATTER PLOT - INCOME VERSUS AGE ***Weights don't work well with scatter plots (change size of dots) ************************************ ***Scatter plot of age by income scatter conrinc age twoway (scatter conrinc age) (lfit conrinc age) ***Save graph graph export "$output\age-income_scatter.png", replace ************************************ ***BAR GRAPH - AGE-SEX STRUCTURE ************************************ ***Generate five-year age groups variable - automatically egen age5y = cut(age), at(18,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90) table age5y, stat(min age) stat(max age) stat(count age) ***Generate variables with male and female totals by five-year age groups sort age5y by age5y: egen maletotal=total(male) by age5y: egen femaletotal=total(female) ***Replace male total by negative value replace maletotal=-maletotal ***Age-sex structure twoway bar maletotal age5y [aweight=wtssnrps], horizontal barwidth(5) fcolor(navy) lcolor(black) lwidth(medium) || /// bar femaletotal age5y [aweight=wtssnrps], horizontal barwidth(5) fcolor(maroon) lcolor(black) lwidth(medium) /// legend(label(1 Males) label(2 Females)) /// ylabel(15(5)85, angle(horizontal) valuelabel labsize(*.8)) /// ytitle("Age group") /// xlabel(-150 "150" -125 "125" -100 "100" -75 "75" -50 "50" -25 "25" 0 25 50 75 100 125 150) /// xtitle("Sample size") /// title("Age-sex structure, United States") /// subtitle("2021 General Social Survey") ***Save graph graph export "$output\age-sex_bar.png", replace ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata02.dta", replace ***Save log log close