********************************** ********* Binary Models ********************************** ********************************** **** Linear Probability Model ********************************** * school.dta dataset twoway (scatter hiqual avg_ed) (lfit hiqual avg_ed) regress hiqual avg_ed lincom _b[_cons]+_b[avg_ed]*3 margins, at(avg_ed=3) margins, at(avg_ed=(3 4)) margins, at(avg_ed=(3 4)) contrast(atcontrast(r._at) wald) predict yhat twoway scatter yhat hiqual avg_ed ********************************** **** Probit Model ********************************** probit hiqual avg_ed * estimating the linear prediction (before turning it into a probability by plotting it into * the cumulative standard normal distribution function!) margins, at(avg_ed=3) predict(xb) * now look at the cumulative standard normal distribution table, or... margins, at(avg_ed=3) probit hiqual avg_ed predict yhat1 twoway scatter yhat1 hiqual avg_ed, sort ylabel(0 1) margins, at(avg_ed=(1 (0.1) 5)) marginsplot marginsplot, yline(0) probit hiqual avg_ed margins, at(avg_ed=(2 3 4)) marginsplot, yline(0) probit hiqual avg_ed margins, at(avg_ed=(2 3)) contrast(atcontrast(r._at) wald) margins, at(avg_ed=(3 4)) contrast(atcontrast(r._at) wald) ********************************** **** Logit Model ********************************** logit hiqual avg_ed di 1/(1+exp(-( -12.30054 +3*3.909635 ))) di exp( -12.30054 +3*3.909635 )/(1+exp( -12.30054 +3*3.909635 )) margins, at(avg_ed=3) logit hiqual avg_ed predict yhat2 twoway scatter yhat2 hiqual avg_ed, ylabel(0 1) ********************************** **** Comparing probit and logit model ********************************** corr yhat1 yhat2 twoway (scatter yhat1 hiqual avg_ed, ylabel(0 1)) /// (scatter yhat2 hiqual avg_ed, ylabel(0 1)), legend(order(1 "Logit" 3 "Probit")) ********************************** **** odds vs. probabilities ********************************** logit hiqual avg_ed logit hiqual avg_ed, or logistic hiqual avg_ed * interpretation: an odds ratio of 49.8 means that if you increase avg_ed by 1 unit in a given school, * that school is now 49.8 times more likely to be a school with good quality than a school with low quality logit hiqual avg_ed * by typing the below command, you have the odds ratio of increasing by 1 unit avg_ed! di exp( 3.909635 ) ********************************** **** Multiple IVs ********************************** * school.dta dataset probit hiqual avg_ed enroll meals margins, at(avg_ed=(2 3 4) (mean)_all) margins, at(avg_ed=(2 3 4) meals=2 (mean)_all) margins, at(avg_ed=(2 3 4) meals=12 (mean)_all) margins, at(avg_ed=(2 3 4) meals=(2 12 22 32) (mean)_all) marginsplot * Be careful with margins when using a non-linear model! probit hiqual avg_ed enroll meals margins, at(avg_ed=(2 3 4) (mean)_all) margins, at(avg_ed=(2 3 4)) list hiqual avg_ed enroll meals in 1/2 reg hiqual avg_ed enroll meals margins, at(avg_ed=(2 3 4) (mean)_all) margins, at(avg_ed=(2 3 4)) ********************************** **** Testing directly a quadratic relationship ********************************** * school.dta dataset * I have a theoretical expectation in my mind: poverty is going to depress hiqual * but not if a school is already poor...how to test for it? quadratic term!!! * which sign for the quadratic term is expected? probit hiqual avg_ed enroll c.meals##c.meals * predicted DV (PR Y=1) at different values of meals from 0 to 100 sum meals margins, at(meals=(0 (10) 100)) marginsplot * be careful with margins after a probit and a logit! probit hiqual avg_ed enroll c.meals##c.meals margins, at(meals=(0 (10) 100)) margins, at(meals=(0 (10) 100) (mean)_all) * contrast with OLS! reg hiqual avg_ed enroll c.meals##c.meals margins, at(meals=(0 (10) 100)) margins, at(meals=(0 (10) 100) (mean)_all) * marginal impact of increasing meals by 1 unit probit hiqual avg_ed enroll c.meals##c.meals margins, dydx(meals) at(meals=(0 (10) 100)) marginsplot, yline(0) * compare with marginal impact of increasing meals by 1 unit without quadratic term probit hiqual avg_ed enroll meals margins, dydx(meals) at(meals=(0 (10) 100)) marginsplot, yline(0) ********************************** **** Testing directly an interaction ********************************** * test the interacion between avg_ed and meals, while controlling for enroll * my theory: HK of parents is able to increase the quality of a school, but only if a * school is not really poor ********************************** **** Goodness of fit ********************************** * nes2004.dta dataset logit vote_2004 educ predict yhat * Notice that the increments are higher in the middle range of education! * not surprising given that the logit assumes a nonlinear relationship between DV and IV! tab educ, sum(yhat) nost ******** Likelihood ratio logit vote_2004 educ di -2*(-553.07398- - 509.37393) logit vote_2004 educ est store full_model logit vote_2004 if e(sample) lrtest full_model . ******** McFadden Pseudo R^2 di (-553.07398 - -509.37393)/(-553.07398) ******** fraction correctly predicted logit vote_2004 educ estat classification di (821+19)/1065 di 821/837 di 19/228 ******** Hosmer and Lemeshow's test estat gof, table table educ if vote_2004!=. estat gof, group(4) table **** Multiple IVs logit vote_2004 educ logit vote_2004 educ age * why education lower in the first model than in the second? corr educ age * age is said to be a suppressor variable, because it suppresses or attenuates education’s "true effect" on turnout *** Comparing models: the likelihood ratio test * full model logit vote_2004 educ age est store full_model * with age removed from the model logit vote_2004 educ if e(sample) lrtest full_model . *** another example with more comparisons * full model logit vote_2004 educ age income_hh est store a * with income_hh removed from the model logit vote_2004 educ age if e(sample) est store b lrtest a b, stats * with income_hh and age removed from the full model logit vote_2004 educ if e(sample) est store c lrtest a c lrtest b c *** Hosmer and Lemeshow's test: see the difference when re-grouping you IVs! logit vote_2004 educ age estat gof, table estat gof, group(4) table *** Useful command to download *** Fitstat * findit fitstat * nest2004.dta dataset logit vote_2004 educ age fitstat * Comparing nested models with fitstat logit vote_2004 educ age income_hh fitstat, saving(m1) logit vote_2004 educ age if e(sample) fitstat, using(m1)