/*********************************** ************************************ ***INTRODUCTION (chapter 1) ***BASIC DESCRIPTIVE STATISTICS (chapter 2) ************************************ ************************************ ************************************ ***GENERAL SOCIAL SURVEY (GSS) DATA ************************************ If you copy the GSS data to your computer before coming to class, it will make everything easier. Get the GSS resources for this course in this link: http://www.ernestoamaral.com/docs/soci420-17fall/course.zip Uncompress the file in your computer. This procedure varies across computers. Basically, it will create a folder called "course" with two sub-folders ("data" and "docs"). Save the "course" folder in a specific location in your computer, following these suggestions for Windows and Macintosh ************************************ ***WINDOWS ************************************ Save the uncompressed "course" folder under the C:\ drive. You can see the C:\ drive under "Computer" or "My PC" in Windows Explorer. ************************************ ***MACINTOSH ************************************ Save the uncompressed "course" folder under Macintosh HD (Hard Drive) To show Macintosh HD on your Finder sidebar, open Finder, click on "Finder" menu, click on "Preferences...", click on the "Sidebar" tab, and select "Hard disks" ************************************ ***CREATE SUB-FOLDERS ************************************ You should create two other sub-folders under "course" ("progs" and "output"). Some computers might ask for your password to save files in the C:\ drive or Macintosh HD. You might have to: 1) create an empty folder "course" 2) copy the "data" and "docs" folders to this new folder These steps will vary from each computer. If you have any questions, we will solve in class, but it would help if you try these steps before coming to class. ************************************ ***"COURSE" SUB-FOLDERS ************************************ "data": General Social Survey microdata "docs": codebook, information on income variable, questionnaires "progs": you should create this folder to save Stata do-files and log-files "output": you should create this folder to save tables and figures */ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***WINDOWS ************************************ ***Start saving results window log using "C:\course\progs\Stata01.log", replace text ***Shortcut for folders global data = "C:\course\data" global output = "C:\course\output" ************************************ ***MACINTOSH ************************************ ***Start saving results window log using "/course/progs/Stata01.log", replace text ***Shortcut for folders global data = "/course/data" global output = "/course/output" ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Change directory cd "$data" ***Open 2016 GSS use "GSS2016.dta", clear ************************************ ***SAMPLE SIZE ************************************ count ***Year tabulate year tab year, missing tab year, m ***Fweight does not work with a continuous weight variable tab year [fweight=wtssall], m ***Iweight maintains sample size in GSS tab year [iweight=wtssall], m ***Aweight also works and maintains sample size tab year [aweight=wtssall], m /* ************************************ ***RELATIONAL OPERATORS ************************************ The relational operators are: > (greater than) < (less than) >= (greater than or equal) <= (less than or equal) == (equal) != (not equal) Observe that the relational operator for equality is a pair of equal signs. This convention distinguishes relational equality from the first equality to generate a variable. See example below... */ ************************************ ***SEX ************************************ tab sex, m tab sex [aweight=wtssall], m ***Generate dummy variable for female tab sex, nolabel generate female=. replace female=0 if sex==1 replace female=1 if sex==2 tab sex female, m ***Generate dummy variable for males gen male=!female tab sex male, m tab female male, m *This is the same as... gen male2=~female tab male male2, m ************************************ ***RELIGION ************************************ ***Religious preference tab relig tab relig [aweight=wtssall] tab relig [aweight=wtssall], m ***Generate religious variable with fewer categories tab relig, nolabel gen religion=. replace religion=1 if relig==1 //protestant replace religion=2 if relig==2 //catholic replace religion=3 if relig==3 //jewish replace religion=4 if relig>=5 & relig<=13 //other replace religion=5 if relig==4 //none tab relig religion, m tab religion, m ***Create label for variable label variable religion "Religious group" ***Create labels for categories label define relcode 1 "Protestant" /// 2 "Catholic" /// 3 "Jewish" /// 4 "Other" /// 5 "None" ***Assign labels for categories of a specific variable label values religion relcode ***New religion variable tab religion tab religion [aweight=wtssall] ************************************ ***RACE/ETHNICITY ************************************ ***Race tab race [aweight=wtssall] tab race [aweight=wtssall], m ***Hispanic tab hispanic, m tab hispanic, m nolabel ***Generate dummy variable for hispanic gen hisp=. replace hisp=0 if hispanic==1 replace hisp=1 if hispanic>=2 & hispanic<=50 tab hispanic hisp, m ***Generate race/ethnicity variable gen raceeth=. replace raceeth=1 if race==1 & hisp==0 //non-hispanic white replace raceeth=2 if race==2 & hisp==0 //non-hispanic black replace raceeth=3 if hisp==1 //hispanic replace raceeth=4 if race==3 & hisp==0 //other tab raceeth race, m tab raceeth hisp, m ***Create label for variable label variable raceeth "Race/Ethnicity" ***Create labels for categories label define racecode 1 "Non-hispanic white" /// 2 "Non-hispanic black" /// 3 "Hispanic" /// 4 "Other" ***Assign labels for categories of a specific variable label values raceeth racecode ***New race/ethnicity variable tab raceeth, m tab raceeth [aweight=wtssall], m ************************************ ***AGE ************************************ ***Age distribution tab age, m tab age [aweight=wtssall], m tab age, m nolabel tab age [aweight=wtssall], m nolabel *Easy way to know minimum and maximum age values summarize age summarize age, detail sum age, d ***Generate age group variable - manually gen agegr1=. replace agegr1=18 if age>=18 & age<=24 replace agegr1=25 if age>=25 & age<=44 replace agegr1=45 if age>=45 & age<=64 replace agegr1=65 if age>=65 & age<=89 tab agegr1, m table agegr1, contents(min age max age count age) ***Generate age group variable - automatically egen agegr2 = cut(age), at(18,25,45,65,90) tab agegr2, m table agegr2, contents(min age max age count age) ***Create label for variables label variable agegr1 "Age group" label variable agegr2 "Age group" ***Create labels for categories label define agecode 18 "18-24" /// 25 "25-44" /// 45 "45-64" /// 65 "65-89" ***Assign labels for categories of specific variables label values agegr1 agegr2 agecode ***New age group variables tab agegr1, m tab agegr1 [aweight=wtssall], m tab agegr2, m tab agegr2 [aweight=wtssall], m ************************************ ***INCOME ************************************ ***Family income distribution tab income tab income [aweight=wtssall] tab income, nolabel tab income [aweight=wtssall], nolabel *This is wrong, because this is an ordinal-level variable sum income sum income, d ***Respondent income distribution tab rincome tab rincome [aweight=wtssall] tab rincome, nolabel tab rincome [aweight=wtssall], nolabel *This is wrong, because this is an ordinal-level variable sum rincome sum rincome, d ***Respondent income in constant dollars ***Inflation-adjusted personal income sum conrinc sum conrinc [aweight=wtssall] sum conrinc, d sum conrinc [aweight=wtssall], d *This is not helpful, because this variable has too many values tab conrinc ***Mean income by sex table sex [aweight=wtssall], c(mean conrinc) ***Mean income by race/ethnicity table raceeth [aweight=wtssall], c(mean conrinc) ***Mean income by age group table agegr1 [aweight=wtssall], c(mean conrinc) ***Mean income by sex and race/ethnicity table raceeth sex [aweight=wtssall], c(mean conrinc) ***Mean income by sex and age group table agegr1 sex [aweight=wtssall], c(mean conrinc) ***Mean income by race/ethnicity and age group table agegr1 raceeth [aweight=wtssall], c(mean conrinc) ***Mean income by sex, race/ethnicity, and age group table agegr1 raceeth sex [aweight=wtssall], c(mean conrinc) ************************************ ***PIE GRAPH ************************************ ***Pie graph of religion graph pie [aweight=wtssall], over(religion) graph export "$output/religion_pie.png", replace //Macintosh graph export "$output\religion_pie.png", replace //Windows ************************************ ***COLUMN GRAPH ************************************ ***Column graph of religion graph bar [aweight=wtssall], over(religion) ytitle("Percent") graph export "$output/religion_column.png", replace //Macintosh graph export "$output\religion_column.png", replace //Windows ************************************ ***HISTOGRAMS ***Weights are not allowed with histograms ************************************ ***Histogram of age hist age, frequency discrete xlabel(0(10)90) xtitle("Age") graph export "$output/age_histogram.png", replace //Macintosh graph export "$output\age_histogram.png", replace //Windows ***Histogram of age by sex hist age, frequency discrete by(female) xlabel(0(10)90) xtitle("Age") ***Overlaying histograms of age by sex twoway (histogram age if female==0, frequency discrete xlabel(0(10)90) fcolor(gs11) lcolor(gs11)) /// (histogram age if female==1, frequency discrete xlabel(0(10)90) fcolor(none) lcolor(black) /// legend(order(1 "Males" 2 "Females")) /// xtitle("Age")) graph export "$output/age-sex_histogram.png", replace //Macintosh graph export "$output\age-sex_histogram.png", replace //Windows ***Histogram of income hist conrinc, frequency xlabel(0(50000)200000) xtitle("Respondent income in constant dollars") graph export "$output/income_histogram.png", replace //Macintosh graph export "$output\income_histogram.png", replace //Windows ************************************ ***SCATTER PLOT ***Weights don't work well with scatter plots (change size of dots) ************************************ ***Scatter plot of age by income scatter conrinc age twoway (scatter conrinc age) (lfit conrinc age) graph export "$output/age-income_scatter.png", replace //Macintosh graph export "$output\age-income_scatter.png", replace //Windows ************************************ ***LINE GRAPH - MEAN INCOME BY AGE ************************************ ***Generate variable with mean income by age sort age by age: egen mincage=mean(conrinc) sum mincage, d ***Line graph of income by age twoway line mincage age [aweight=wtssall], ytitle("Mean income") graph export "$output/age-income_line.png", replace //Macintosh graph export "$output\age-income_line.png", replace //Windows ************************************ ***LINE GRAPH - MEAN INCOME BY AGE ************************************ ***Generate variable with mean income by age and sex sort sex age by sex age: egen mincagesex=mean(conrinc) sum mincagesex, d ***Line graph of income by age and sex twoway line mincagesex age if female==0 [aweight=wtssall] || /// line mincagesex age if female==1 [aweight=wtssall], /// legend(label(1 Males) label(2 Females)) /// ytitle("Mean income") /// xtitle("Age") graph export "$output/age-sex-income_line.png", replace //Macintosh graph export "$output\age-sex-income_line.png", replace //Windows ************************************ ***BAR GRAPH - AGE-SEX STRUCTURE ************************************ ***Generate five-year age groups variable - automatically egen age5y = cut(age), at(18,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90) table age5y, contents(min age max age count age) ***Generate variables with male and female totals by five-year age groups sort age5y by age5y: egen maletotal=total(male) by age5y: egen femaletotal=total(female) ***Replace male total by negative value replace maletotal=-maletotal ***Age-sex structure twoway bar maletotal age5y [aweight=wtssall], horizontal barwidth(5) fcolor(navy) lcolor(black) lwidth(medium) || /// bar femaletotal age5y [aweight=wtssall], horizontal barwidth(5) fcolor(maroon) lcolor(black) lwidth(medium) /// legend(label(1 Males) label(2 Females)) /// ylabel(15(5)85, angle(horizontal) valuelabel labsize(*.8)) /// ytitle("Age group") /// xlabel(-150 "150" -125 "125" -100 "100" -75 "75" -50 "50" -25 "25" 0 25 50 75 100 125 150) /// xtitle("Sample size") /// title("Age-sex structure, United States") /// subtitle("2016 General Social Survey") graph export "$output/age-sex_bar.png", replace //Macintosh graph export "$output\age-sex_bar.png", replace //Windows ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "Stata01.dta", replace ***Save log log close