************************************ ************************************ ***THE NORMAL CURVE (chapter 5) ************************************ ************************************ ************************************ ***OPENING COMMANDS ************************************ ***Clear memory clear all ***Start saving results window log using "C:\course\progs\Stata05.log", replace text // Windows log using "/course/progs/Stata05.log", replace text // Macintosh ************************************ ***GRAPH COMMAND TO GENERATE NORMAL DISTRIBUTION ************************************ ***Plot two normal distributions ***IQ scores for females and males graph twoway (function y=normalden(x,100,10), range(40 160) lcolor(maroon) lw(medthick)) /// (function y=normalden(x,100,20), range(40 160) lcolor(navy) lw(medthick)), /// title("Normal density of IQ scores for females and males", color(black)) /// xtitle("IQ Units", size(medlarge)) ytitle("") xlabel(40(10)160) /// xscale(lw(medthick)) yscale(lw(medthick)) /// legend(order(1 "Females" 2 "Males")) graphregion(fcolor(white)) ************************************ ***AREA UNDER THE NORMAL CURVE ***"normal" shows area below Z ************************************ ***Survey in a community ***Age = 35.5 ***Standard deviation = 10 ************************************ ***What's the probability of finding someone ***who is younger than 44 years of age? *Estimate Z = (x - mean) / standard deviation di (44-35.5)/10 *Area below Z=0.85 display normal(0.85) di normal(0.85) ************************************ ***What's the probability of finding someone ***who is older than 40 years of age? *Estimate Z = (x - mean) / standard deviation di (40-35.5)/10 *Area above Z=0.45 di 1-normal(0.45) ************************************ ***What's the probability of finding someone ***who is younger than 22 years of age? *Estimate Z = (x - mean) / standard deviation di (22-35.5)/10 *Area below Z=-1.35 di normal(-1.35) ************************************ ***What's the probability of finding someone ***who is between 32 and 42 years of age? *Estimate Z = (x - mean) / standard deviation di (32-35.5)/10 di (42-35.5)/10 *Area between Z=-0.35 and Z=0.65 di normal(0.65)-normal(-0.35) ************************************ ***What's the probability of finding someone ***who is between 42 and 46 years of age? *Estimate Z = (x - mean) / standard deviation di (42-35.5)/10 di (46-35.5)/10 *Area between Z=0.65 and Z=1.05 di normal(1.05)-normal(0.65) ************************************ ***What's the probability of finding someone ***who is above 50 years of age? *Estimate Z = (x - mean) / standard deviation di (50-35.5)/10 ***Area above Z=1.45 di 1-normal(1.45) ************************************ ***DISTRIBUTION OF INCOME ***GENERAL SOCIAL SURVEY ************************************ ***Open 2016 GSS use "C:\course\data\GSS2016.dta", clear // Windows use "/course/data/GSS2016.dta", clear // Macintosh ***Histogram of income hist conrinc, norm percent ***Boxplot of income graph hbox conrinc ***Quantile-normal plot of income qnorm conrinc ***Power transformation ***q<1 (reduce positive skew) ***log(y): q=0 gen lnconrinc = ln(conrinc) ***Histogram of log of income hist lnconrinc, norm percent ***Boxplot of log of income graph hbox lnconrinc ***Quantile-normal plot of log income qnorm lnconrinc ************************************ ***What's the probability of finding someone ***who makes more than $50,000 per year? ***Original income variable (conrinc) ***This variable does not have a normal distribution sum conrinc ***Log of income (lnconrinc) ***This variable has a distribution closer to normal sum lnconrinc *Mean = 9.95 *Standard deviation = 1.16 *$50,000 in log scale di ln(50000) *Estimate Z = (x - mean) / standard deviation di (10.82-9.95)/1.16 ***Area above Z=0.75 di 1-normal(0.75) ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "Stata05.dta", replace ***Save log log close