Logistic Regression models how binary (or multinomial) response variable is related to a set of explanatory variables, which can be discrete and/or continuous.
Binary Logistic Regression
It estimates the probability that a characteristic is present (e.g. estimate probability of "success") given the values of explanatory variables, in this case a single categorical variable; π = Pr (Y = 1|X = x).
Y: binary response variable
X = (X1, X2, ..., Xk): a set of explanatory variables which can be discrete, continuous, or a combination
Model
πi=Pr(Yi=1|Xi=xi)=exp(β0+β1xi)/1+exp(β0+β1xi),
or
logit(πi)=log(πi1−πi)=β0+β1xi=β0+β1xi1+…+βkxik
Assumptions
- The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
- Distribution of Yi is Bin(ni, πi), i.e., binary logistic regression model assumes binomial distribution of the response. The dependent variable typically assumes a distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal,...)
- Assumes linear relationship between the logit of the response and the explanatory variables; logit(π) = β0 + βX.
Model Fit
- Overall goodness-of-fit statistics of the model: Pearson chi-square statistic (X2),Deviance (G2) and Likelihood ratio test and statistic (ΔG2), and Hosmer-Lemeshow test and statistic
- Residual analysis: Pearson, deviance, adjusted residuals, etc...
- Overdispersion
Parameter Estimation
The maximum likelihood estimator (MLE)
Multinomial Logistic Regression
It models how multinomial response variable Y depends on a set of k explanatory variables, X=(X1, X2, ... Xk).