Multivariate logistic regression is a type of data analysis that predicts any number of[1] outcomes based on multiple[2][3] independent variables.[4][5][6] It is based on the assumption that the natural logarithm of the odds has a linear relationship with independent variables.[7]
First, the baseline odds of a specific outcome compared to not having that outcome are calculated, giving a constant (intercept).[8] Next, the independent variables are incorporated into the model, giving a regression coefficient (beta) and a "P" value for each independent variable.[9] The "P" value determines how significantly the independent variable impacts the odds of having the outcome or not.[10]
It is desirable to use as few variables as necessary,[11] and to have at least 10 - 20 times as many observations as independent variables.[12]
Multivariate logistic regression uses a formula similar to univariate logistic regression,[13] but with multiple independent variables.
where v is the number of independent variables. The following formula shows that multivariate logistic regression is simply a standard linear regression model:[14]
The two main types of multivariate logistic regression are linear regression and logistic regression.
Linear regression produces results that show a linear relationship with a single independent variable (IV) and can be plotted on a graph as a straight line.[15]
In contrast, logistic regression produces results that show a nonlinear relationship. As a result, plotting the data on a graph produces a curved line called a sigmoid. Unlike linear regression, logistic regression produces results based on two or more independent variables.[16][17][5]
The odds ratio associated with a single independent variable can change when other independent variables are accounted for as well.[18] However, the changes are usually insignificant, but they can indicate errors.[19]
Multivariate logistic regression assumes that the different observations are independent.[20] It also assumes that the natural logarithm of the odds ratio and the dependent variables show a linear relationship. However, it does not assume a normal distribution of the dependent variables.
A null hypothesis is an assumption that the independent variables do not have any impact on the dependent variable.[21]
There are three main types of logistic regression dependent variables (DVs): Binary, multi-class, and ordinal.[22]
A binary dependent variable is a variable with only two outcomes, and the possible values must be opposites of each other.[23]
A multi-class dependent variable is a variable with at least three qualitative (non-numerical) outcomes, usually with a constant numerical stand-in.[24]
An ordinal dependent variable is a variable with at least three possible outcomes, which are numerically different.[25]
Multivariate logistic regression produces the following models:[26]
Logit models distinguish independent and dependent variables.
Unlike logit models, log-linear models do not distinguish between categories of variables.
Probit models function similarly to logit models due to the similarities of normal and logistic distributions. However, since the independent variables are interpreted as standard deviations instead of odds ratios, these models are also more similar to linear models than logit models.
When scientists use logistic regression, they usually include as many independent variables as necessary.[5]
Multivariate logistic regression is used by physicians to:[27]
Multivariate logistic regression is also used to analyze customer preferences for products.[30]
Multivariate logistic regressions are also used in machine learning.[31]
While both multivariate logistic regression and multivariable logistic regression correlate multiple independent variables to outcomes, multivariate logistic regression correlates independent variables to multiple outcomes, while multivariable logistic regression correlates independent variables to a single outcome.[32][3]