*********************************** ************************************ ***BASIC DESCRIPTIVE STATISTICS (chapter 2) ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***WINDOWS ************************************ ***Start saving results window log using "C:\course\progs\Stata02.log", replace text ***Shortcut for folders global data = "C:\course\data" global output = "C:\course\output" ************************************ ***MACINTOSH ************************************ ***Start saving results window log using "/course/progs/Stata02.log", replace text ***Shortcut for folders global data = "/course/data" global output = "/course/output" ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Change directory cd "$data" ***Open 2016 GSS use "GSS2016.dta", clear ************************************ ***WEIGHTS ************************************ ***Fweight does not work with a continuous weight variable tab year [fweight=wtssall], m ***Iweight maintains sample size in GSS tab year [iweight=wtssall], m ***Aweight also works and maintains sample size tab year [aweight=wtssall], m ************************************ ***COMPLEX SURVEY DESIGN ************************************ ***First-stage unit ***VSTRAT: Variance Stratum ***National Frame Areas (NFAs): one or more counties ***Second-stage unit ***VPSU: Variance Primary Sampling Unit ***Segments: block, group of blocks, or census tract ***singleunit(scaled) ***The scaling factor comes from using ***the average of the variances from the strata with multiple sampling units ***for each stratum with one sampling unit svyset [weight=wtssall], strata(vstrat) psu(vpsu) singleunit(scaled) ************************************ ***SEX ************************************ tab sex tab sex, m tab sex [aweight=wtssall], m svy: tab sex ***Generate dummy variable for female tab sex, nolabel generate female=. replace female=0 if sex==1 replace female=1 if sex==2 tab sex female, m tab female tab female, m svy: tab female ***Generate dummy variable for male gen male=!female tab male female, m tab male ************************************ ***RELIGION ************************************ ***Religious preference tab relig tab relig [aweight=wtssall] tab relig [aweight=wtssall], m svy: tab relig ***Generate religious variable with fewer categories tab relig, nolabel gen religion=. replace religion=1 if relig==1 //protestant replace religion=2 if relig==2 //catholic replace religion=3 if relig==3 //jewish replace religion=4 if relig>=5 & relig<=13 //other replace religion=5 if relig==4 //none tab relig religion, m tab religion, m ***Create label for variable label variable religion "Religious group" ***Create labels for categories label define relcode 1 "Protestant" /// 2 "Catholic" /// 3 "Jewish" /// 4 "Other" /// 5 "None" ***Assign labels for categories of a specific variable label values religion relcode ***New religion variable tab religion tab religion [aweight=wtssall] svy: tab religion ************************************ ***RACE/ETHNICITY ************************************ ***Race tab race [aweight=wtssall] tab race [aweight=wtssall], m svy: tab race ***Hispanic tab hispanic, m tab hispanic, m nolabel svy: tab hispanic ***Generate dummy variable for hispanic gen hisp=. replace hisp=0 if hispanic==1 replace hisp=1 if hispanic>=2 & hispanic<=50 tab hispanic hisp, m ***Generate race/ethnicity variable gen raceeth=. replace raceeth=1 if race==1 & hisp==0 //non-hispanic white replace raceeth=2 if race==2 & hisp==0 //non-hispanic black replace raceeth=3 if hisp==1 //hispanic replace raceeth=4 if race==3 & hisp==0 //other tab raceeth race, m tab raceeth hisp, m ***Create label for variable label variable raceeth "Race/Ethnicity" ***Create labels for categories label define racecode 1 "Non-hispanic white" /// 2 "Non-hispanic black" /// 3 "Hispanic" /// 4 "Other" ***Assign labels for categories of a specific variable label values raceeth racecode ***New race/ethnicity variable tab raceeth, m tab raceeth [aweight=wtssall], m svy: tab raceeth ************************************ ***AGE ************************************ ***Age distribution tab age, m tab age [aweight=wtssall], m tab age, m nolabel tab age [aweight=wtssall], m nolabel svy: tab age ***Generate age group variable - manually gen agegr1=. replace agegr1=18 if age>=18 & age<=24 replace agegr1=25 if age>=25 & age<=44 replace agegr1=45 if age>=45 & age<=64 replace agegr1=65 if age>=65 & age<=89 tab agegr1, m table agegr1, contents(min age max age count age) ***Generate age group variable - automatically egen agegr2 = cut(age), at(18,25,45,65,90) tab agegr2, m table agegr2, contents(min age max age count age) ***Create label for variables label variable agegr1 "Age group" label variable agegr2 "Age group" ***Create labels for categories label define agecode 18 "18-24" /// 25 "25-44" /// 45 "45-64" /// 65 "65-89" ***Assign labels for categories of specific variables label values agegr1 agegr2 agecode ***New age group variables tab agegr1, m tab agegr1 [aweight=wtssall], m svy: tab agegr1 tab agegr2, m tab agegr2 [aweight=wtssall], m svy: tab agegr2 ************************************ ***INCOME ************************************ ***Family income distribution tab income tab income [aweight=wtssall] tab income, nolabel tab income [aweight=wtssall], nolabel svy: tab income ***Respondent income distribution tab rincome tab rincome [aweight=wtssall] tab rincome, nolabel tab rincome [aweight=wtssall], nolabel svy: tab rincome ***Respondent income in constant dollars ***Inflation-adjusted personal income ***No weight sum conrinc, d mean conrinc estat sd ***Weight ***It corrects the mean sum conrinc [aweight=wtssall], d mean conrinc [aweight=wtssall] ***Survey design ***It corrects the mean and standard error svy: mean conrinc estat sd ************************************ ***PIE GRAPH ************************************ ***Pie graph of religion graph pie [aweight=wtssall], over(religion) graph export "$output/religion_pie.png", replace //Macintosh graph export "$output\religion_pie.png", replace //Windows ************************************ ***COLUMN GRAPH ************************************ ***Column graph of religion graph bar [aweight=wtssall], over(religion) ytitle("Percent") graph export "$output/religion_column.png", replace //Macintosh graph export "$output\religion_column.png", replace //Windows ************************************ ***HISTOGRAMS ***Weights are not allowed with histograms ************************************ ***Histogram of age hist age, frequency discrete xlabel(0(10)90) xtitle("Age") graph export "$output/age_histogram.png", replace //Macintosh graph export "$output\age_histogram.png", replace //Windows ***Histogram of age by sex hist age, frequency discrete by(female) xlabel(0(10)90) xtitle("Age") ***Overlaying histograms of age by sex twoway (histogram age if female==0, frequency discrete xlabel(0(10)90) fcolor(gs11) lcolor(gs11)) /// (histogram age if female==1, frequency discrete xlabel(0(10)90) fcolor(none) lcolor(black) /// legend(order(1 "Males" 2 "Females")) /// xtitle("Age")) graph export "$output/age-sex_histogram.png", replace //Macintosh graph export "$output\age-sex_histogram.png", replace //Windows ***Histogram of income hist conrinc, frequency xlabel(0(50000)200000) xtitle("Respondent income in constant dollars") graph export "$output/income_histogram.png", replace //Macintosh graph export "$output\income_histogram.png", replace //Windows ************************************ ***SCATTER PLOT ***Weights don't work well with scatter plots (change size of dots) ************************************ ***Scatter plot of age by income scatter conrinc age twoway (scatter conrinc age) (lfit conrinc age) graph export "$output/age-income_scatter.png", replace //Macintosh graph export "$output\age-income_scatter.png", replace //Windows ************************************ ***BAR GRAPH - AGE-SEX STRUCTURE ************************************ ***Generate five-year age groups variable - automatically egen age5y = cut(age), at(18,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90) table age5y, contents(min age max age count age) ***Generate variables with male and female totals by five-year age groups sort age5y by age5y: egen maletotal=total(male) by age5y: egen femaletotal=total(female) ***Replace male total by negative value replace maletotal=-maletotal ***Age-sex structure twoway bar maletotal age5y [aweight=wtssall], horizontal barwidth(5) fcolor(navy) lcolor(black) lwidth(medium) || /// bar femaletotal age5y [aweight=wtssall], horizontal barwidth(5) fcolor(maroon) lcolor(black) lwidth(medium) /// legend(label(1 Males) label(2 Females)) /// ylabel(15(5)85, angle(horizontal) valuelabel labsize(*.8)) /// ytitle("Age group") /// xlabel(-150 "150" -125 "125" -100 "100" -75 "75" -50 "50" -25 "25" 0 25 50 75 100 125 150) /// xtitle("Sample size") /// title("Age-sex structure, United States") /// subtitle("2016 General Social Survey") graph export "$output/age-sex_bar.png", replace //Macintosh graph export "$output\age-sex_bar.png", replace //Windows ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "Stata02.dta", replace ***Save log log close