Answered By: Statistical Consulting
Last Updated: Aug 23, 2016     Views: 4

Generalized Linear Model (GLM): general class of linear models that are made up of 3 components: Random, Systematic, and Link Function

– Random component: dependent variable (Y) and its probability distribution
– Systematic Component: the set of explanatory variables (X1,...,Xk)
– Link Function: a function of the mean that is a linear function of the explanatory variables
 

Assumptions

  • Cases are independent
  • The dependent variable Yi typically assumes a distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal,...)  -- but no assumption of normal distribution
  • GLM assumes a linear relationship between the transformed response in terms of the link function and the explanatory variables
  • Independent (explanatory) variables can be even the power terms or some other nonlinear transformations of the original independent variables
  • Errors need to be independent but NOT normally distributed
  • Uses maximum likelihood estimation (MLE)  to estimate the parameters, and thus relies on large-sample approximations
  • Goodness-of-fit measures rely on sufficiently large samples, where a heuristic rule is that not more than 20% of the expected cells counts are less than 5

Examples:

Simple Linear Regression models how mean expected value of a continuous response variable depends on a set of explanatory variables, where index i stands for each data point:

E(Yi)=β0+βxi 

  • Random component: Y is a response variable and has a normal distribution, and generally we assume errors, ei ~ N(0, σ2).
  • Systematic component: X is the explanatory variable (can be continuous or discrete) and is linear in the parameters β0  + βxi . Notice that with a multiple linear regression where we have more than one explanatory variable, e.g., (X1X2, ... Xk), we would have a linear combination of these Xs in terms of regression parameters β's, but the explanatory variables themselves could be transformed, e.g., X2, or log(X)
  • Link function: Identity Link, η = g(E(Yi)) = E(Yi) --- identity because we are modeling the mean directly

Binary Logistic Regression models how binary response variable depends on a set of k explanatory variables, X=(X1X2, ... Xk). 

logit(π)=log(π/1−π)=β0+βxi+…+β0+βxk′

which models the log odds of probability of "success" as a function of explanatory variables.

  • Random component: The distribution of Y is assumed to be Binomial(n,π), where π is a probability of "success". 
  • Systematic component: X's are explanatory variables (can be continuous, discrete, or both) and are linear in the parameters, e.g.,  β0 + βxi + ... + β0 + βxkAgain, transformation of the X's themselves are allowed like in linear regression; this holds for any GLM.
  • Link function: Logit link:

η=logit(π)=log(π/1−π)

More generally, the logit link models the log odds of the mean, and the mean here is π. Binary logistic regression models are also known as logit models when the predictors are all categorical.

Log-linear Model models the expected cell counts as a function of levels of categorical variables, e.g., for a two-way table the saturated model

log(μij)=λ+λAI+ λBjABij

where μij=E(nij) as before are expected cell counts (mean in each cell of the two-way table), A and B represent two categorical variables, and λij's are model parameters, and we are modeling the natural log of the expected counts.

  • Random component: The distribution of counts, which are the responses, is Poisson
  • Systematic component: X's are discrete variables used in cross-classification, and are linear in the parameters λ+λX1iX2j+…+λXkk+…
  • Link Function: Log link, η = log(μ) --- log because we are modeling the log of the cell means.

Contact Us!

For more information, please visit the statistical consulting website, or contact us: