************************************ ************************************ ***SOCI 420: ADVANCED METHODS OF SOCIAL RESEARCH ***PARTIAL CORRELATION, MULTIPLE REGRESSION, AND CORRELATION (chapter 15) ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Shortcut for folders global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata15.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ************************************ ***APPEND DIFFERENT YEARS ************************************ ***Open 2018 GSS use "$data\GSS2018.dta", clear ***Append 2010 GSS append using "$data\GSS2010.dta" ***Append 2004 GSS append using "$data\GSS2004.dta" ***Verify year tab year, missing tab year, m ***Complex survey design svyset [weight=wtssall], strata(vstrat) psu(vpsu) singleunit(scaled) ************************************ ***ORDINARY LEAST SQUARES (OLS) REGRESSION ************************************ ***Sample size count ***Keep only observations with non-missing values tab age, m nolabel tab educ, m nolabel tab conrinc, m nolabel keep if age!=. & age!=.n & educ!=. & educ!=.a & conrinc!=. & conrinc!=.i count ***Drop observations with missing values ***Same as above drop if age==. | age==.n | educ==. | educ==.a | conrinc==. | conrinc==.i count ************************************ ***OLS with income, age, and education ************************************ ***Use complex survey design svy, subpop(if conrinc!=.i): reg conrinc age educ if year==2018 ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size and estimate robust standard errors reg conrinc age educ if year==2018 [pweight=wtssall], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg conrinc age educ if year==2018 [aweight=wtssall] ************************************ ***Determining normality ************************************ ***Dependent variable does not have normal distribution hist conrinc if year==2018, freq normal ***Summary statistics of income sum conrinc if year==2018, d ***Generate the logarithm of income gen lnconrinc = ln(conrinc) ***Log of income has a distribution closer to normal hist lnconrinc if year==2018, freq normal ************************************ ***OLS with natural logarithm of income, age, and education ************************************ ***Use complex survey design svy, subpop(if conrinc!=.i): reg lnconrinc age educ if year==2018 ***Automatically see exponential of coefficients svy, subpop(if conrinc!=.i): reg lnconrinc age educ if year==2018, eform(Exp. Coef.) ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size and estimate robust standard errors reg lnconrinc age educ if year==2018 [pweight=wtssall], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg lnconrinc age educ if year==2018 [aweight=wtssall] ************************************ ***Interpret coefficients with log of income ************************************ ***When x increases by 1, ***y increases by 100*[exp(coefficient)-1] percent, ***controlling for the effects of all other independent variables svy, subpop(if conrinc!=.i): reg lnconrinc age educ if year==2018 ***Example of coefficient for age di exp(0.0179909) ***Percentage interpretation di 100*(exp(0.0179909)-1) ***When coefficient has a small magnitude, ***we can use 100*coefficient di 100*(0.0179909) ***Example of coefficient for years of education di exp(0.1432998) di 100*(exp(0.1432998)-1) di 100*(0.1432998) ************************************ ***OLS with natural logarithm of income, age, age squared, and education ************************************ ***Generate age squared gen agesq=age * age ***Use complex survey design svy, subpop(if conrinc!=.i): reg lnconrinc age agesq educ if year==2018 ************************************ ***Dummy variables ************************************ ************************************ ***Age ************************************ ***Age does not have a normal distribution hist age if year==2018, percent normal ***Generate age group variable ***18-24; 25-34; 35-49; 50-64; 65+ egen agegr = cut(age), at(18,25,35,50,65,90) tabstat age, by(agegr) stat(min max count) ***Generate dummy variables for age (automatically) tab agegr, gen(agegr) tab agegr agegr1, m tab agegr agegr2, m tab agegr agegr3, m tab agegr agegr4, m tab agegr agegr5, m ***Browse data browse age agegr agegr1-agegr5 ***Choose reference category for age ***Use the category with the largest sample size as the reference (35–49) tab agegr, m ***Or age category with large sample and meaningful interpretation for your problem ***Age group with the highest average income (50–64) tabstat conrinc, by(agegr) stat(mean count) ************************************ ***Education ************************************ ***Education does not have a normal distribution hist educ if year==2018, percent normal ***Utilize education group (degree) variable tab degree, m ***Generate dummy variables for education (automatically) tab degree, gen(educgr) tab degree educgr1, m tab degree educgr2, m tab degree educgr3, m tab degree educgr4, m tab degree educgr5, m ***Browse data browse educ degree educgr1-educgr5 ***Choose reference category for education ***Use the category with the largest sample size as the reference (high school) tab degree, m ***Education category with highest average income (graduate school) does not have large sample tabstat conrinc, by(degree) stat(mean count) ************************************ ***OLS with log of income and dummy independent variables ************************************ ***35-49 as reference group (agegr3): largest sample size tab agegr ***High school as reference group (educgr2): largest sample size tab degree ***Regression svy, subpop(if conrinc!=.i): reg lnconrinc agegr1 agegr2 agegr4 agegr5 educgr1 educgr3 educgr4 educgr5 if year==2018 ***Regression with dummies and reference indicated within "reg" command ***"i" inform dummy variables ***"b#" indicate reference category tab agegr tab degree tab degree, nolabel svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2018 ***Automatically see exponential of coefficients svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2018, eform(Exp. Coef.) ************************************ ***Interpret coefficients with log of income ************************************ ***When x increases by 1, ***y increases by 100*[exp(coefficient)-1] percent, ***controlling for the effects of all other independent variables svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2018 ***Example of coefficient for Junior College ***compared to High School di exp(0.0374562) ***Percentage interpretation di 100*(exp(0.0374562)-1) ***When coefficient has a small magnitude, ***we can use 100*coefficient di 100*(0.0374562) ***Since the coefficient for 18-24 age group has a large magnitude, ***compared to 35-49 age group, ***we cannot use 100*coefficient di exp(-1.406383) di 100*(exp(-1.406383)-1) di 100*(-1.406383) ************************************ ***Standardized regression coefficients, sum of squares, and adjusted R-squared ************************************ ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size reg lnconrinc ib35.agegr ib1.degree if year==2018 [pweight=wtssall], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg lnconrinc ib35.agegr ib1.degree if year==2018 [aweight=wtssall] ************************************ ***Export results to Word/Excel with outreg2 command ************************************ ***If your Stata doesn't have the outreg2 command, ***type "ssc install outreg2" to install it. *ssc install outreg2 ************************************ ***2004 model ************************************ ***Coefficients svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2004 ***Export to Excel outreg2 using "$output\OLS.xls", replace excel dec(3) ctitle(2004) nodepvar ***Standardized coefficients and adjusted R-squared ***Outreg2 doesn't allow pweight to estimate standardized coefficients reg lnconrinc ib35.agegr ib1.degree if year==2004 [aweight=wtssall], beta ***Export to Excel (including adjusted R-squared) outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(2004, beta) nodepvar adjr2 e(r2) stat(beta) ************************************ ***2010 model ************************************ ***Coefficients svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2010 ***Export to Excel outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(2010) nodepvar ***Standardized coefficients and adjusted R-squared ***Outreg2 doesn't allow pweight to estimate standardized coefficients reg lnconrinc ib35.agegr ib1.degree if year==2010 [aweight=wtssall], beta ***Export to Excel (including adjusted R-squared) outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(2010, beta) nodepvar adjr2 e(r2) stat(beta) ************************************ ***2018 model ************************************ ***Coefficients svy, subpop(if conrinc!=.i): reg lnconrinc ib35.agegr ib1.degree if year==2018 ***Export to Excel outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(2018) nodepvar ***Standardized coefficients and adjusted R-squared ***Outreg2 doesn't allow pweight to estimate standardized coefficients reg lnconrinc ib35.agegr ib1.degree if year==2018 [aweight=wtssall], beta ***Export to Excel (including adjusted R-squared) outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(2018, beta) nodepvar adjr2 e(r2) stat(beta) ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata15.dta", replace ***Save log log close