How does multiple regression analysis (MRA) work? - Chapter 1

Pearson correlations represent the relationship between two variables, in the form of a 'sign' (positive or negative) and the strength of the relationship. A positive correlation means that when one variable increases, the other does too. Negative means that the values of the variables ​​change in the opposite direction. However, if we have two or more interval variables and want to predict one of these variables (the dependent variable Y) from the other variables (the independent variables X1 to Xk), we can use regression analysis. In that case we need a formula for the most optimal prediction.

Why use regression?

The main difference between regression analysis and Pearson correlations is the asymmetry: predicting Y using X yields different results than predicting X using Y. A second reason to use regression is to be able to check whether the observed values ​​are consistent with a causal explanation by the investigator. Using regression, however, says nothing about the correctness of such causal models.

What is the essence of simple regression analysis?

The general formula for a simple regression is Y = b0 + b1X + e, where Y stands for the dependent variable and X for the independent variable. The parameters to be estimated are called the intercept (b0) and the regression weight (b1). The error (e) is the difference between the estimated and actual value of Y. The relationship between X and Y can also be graphed. The most commonly used method to make an optimal prediction is the least squares method. With these methods, the parameters are chosen in such a way that the sum of the squared predicted errors is as small as possible.

What does 'regression to the mean' refer to?

The raw scores of X and Y can be converted into (standardized) z-scores. These scores always lie between -1 and +1, and the mean is always 0. In that case, there may be regression to the mean: the predicted value of Y is always closer to the mean than the corresponding value of X Regression to the mean is an important property of numerical series from variables that do not have a perfect linear relationship with each other. Substantial statements about reality derived from regression do not necessarily have to be correct. In empirical reality, there may be phenomena that bring the values ​​closer to or further from the mean. Statisticians are not concerned with this type of empirical science.

What is a multiple regression analysis?

Predicting and explaining (causal) relationships can also be important if there are more than two variables. The use of multiple regression has three advantages in this area over the use of Pearson correlations.

First, it provides us with information about the optimal prediction of Y using a combination of X variables. In addition, we can determine how good our prediction is by looking at the total contribution of the set of predictors to the prediction. Finally, we can determine how good each individual predictor is, so what the contribution of each predictor is to the prediction. It is important to note that the most optimal prediction does not necessarily have to be a correct prediction. The latter advantage can be used to more clearly establish a causal relationship or to see whether adding a predictor has added value.

What are multiple correlations?

The multiple correlation (R) always has a value that lies between 0 and 1, so it cannot be negative. R2 refers to the proportionally explained variance of Y, with a higher R2 indicating a better prediction. Adjusted R2 can be used to correct for an overestimate of the shared variance . Thus, the predictors can have shared and unique variance. This unique variance can be represented by squared semi-partial correlations. Sometimes there is suppression, where the unique contribution of a variable after correction for another variable is greater than the contribution without correction. In other words, the real effect of X1 on Y was suppressed by the relationships of X1 and Y with X2,

How are constant and regression weights obtained?

The constant generally has no intrinsic value for psychologists and is therefore difficult to interpret. The interpretation of the regression weights can also be problematic, because the units of measurement are often arbitrary. This also makes it difficult to determine which predictor is most important. The latter problem can be solved by using standardized regression weights. In this way you are independent of units of measurement and you can compare different predictors well. However, this has the negative consequence that you become dependent on the standard deviation within samples, which is particularly problematic if you want to compare different studies. Regression weights are always partial, which means that they are only valid as long as all variables have been included in the equation, i.e. when correcting for the effects of all other variables.

How is testing done from samples to populations?

So far we have only looked at descriptive statistics. However, we can also use inferential statistics to make statements about the population from which the sample originates. An F-test can be used to determine whether the total contribution of all variables differs from zero. To determine the unique contribution of each predictor, a t-test can be performed for each predictor. The more predictors, the greater the chance on Type I errors.

Therefore, the general F-test is used as a kind of gatekeeper to determine whether the t-tests should be considered.

Which assumptions exist?

There are several assumptions that must be met:

  1. The dependent variable must be of interval level; predictors can be binary or interval level.

    • Satisfying this assumption is virtually impossible, but important for correct interpretation. Most real-life studies use quasi-interval variables. Fortunately, multiple regression is generally quite robust for small deviations from the interval level.

  2. There is a linear relationship between the predictors (Xis) and the dependent variable.

    • With standard multiple regression, only linear relationships can be found (and for example no curvi-linear relationships). Deviations can be determined with a residual plot.

  3. The residuals have (a) a normal distribution, (b) the same variance for all values ​​of the predictor linear combinations and (c) are independent of each other.

The assumption of normally distributed residuals (3a) is not very important to consider, because regression tests are robust against violation if the sample is large enough (N> 100). Usually, this assumption is checked by visually inspecting a histogram. The assumption of heteroscedasticity (3b) must be checked, because regression is not robust against violation of this assumption. A residual plot can be used for this. The last assumption (independence from errors, 3c) is very important, but difficult to check. Fortunately, this assumption is often met by the research design. Checking assumptions always depends on the judgment of researchers and can therefore be interpreted differently by everyone.

What is meant by multicollinearity and outliers?

Outliers are scores of three or more standard deviations above or below the mean. It is important to consider why an individual's score is an outlier in the analysis. In addition, outliers can have a disproportionate influence on regression weights. If you decide to remove outliers from the analysis, it is good to be clear about this in the report and to indicate explicitly why you have chosen to do this.

Multicollinearity

Several problems can arise if correlations between predictive variables are too strong. Sometimes the regression analysis yields no results at all. In other cases, the estimates are unreliable or it is difficult to interpret the results. To check for multicollinearity, we should look at the tolerance of each predictor (<.10) and the related VIF (<10). There are two strategies to combat multicollinearity. Overlap between variables can be attributed to an underlying construct or latent variable. In that case, the variables can be merged into a single variable. Another strategy is based on the idea that there may be hierarchy, which means that some of the predictive variables are the cause of one of the other predictive variables. It can then be determined which predictors are most important; empirically, theoretically and / or statistically.

What are step-by-step methods for regression?

The empirical method is also called stepwise regression and has two variants: forward (in which new predictors are added until the p- value of a predictor is less than .05) and backward (in which all predictors are included in the analysis, after which the non-significant predictors are removed one by one). One problem is that these methods do not always yield the same results. In addition, certain variables may or may not become significant in certain steps because the same variables are not used in each step. Step-by-step approaches are very sensitive to probability influences. In addition, it is better to choose variables based on substantial (theoretical) considerations. However, the above procedures can still be used if (1) prediction is the goal, (2) the number of predictors is small compared to the number of people, and (3) cross-validation with other samples yields similar results.

When is regression hierarchical?

Sometimes it is better not to do a single multiple regression, but a successive series of regressions. This method can be used, for example, if variables only become important when other variables have been checked (regarding curvi-linear relationships, interactions and missing data). In addition, this method is useful for testing different causal blocks.

How to apply regression analysis in SPSS?

When you perform a regression analysis in SPSS, you start by checking outliers and assumptions. Then you interpret the multiple correlation and related aspects. Finally, you interpret the regression weights.

Voor toegang tot deze pagina kan je inloggen

 

Voor volledige toegang tot deze pagina kan je inloggen

 

Inloggen (als je al bij JoHo bent aangesloten)

   Aansluiten   (voor online toegang tot alle webpagina's)

 

Hoe het werkt

 

Aanmelden bij JoHo

 

 

Gerelateerde pagina's

  Chapters 

Teksten & Informatie

JoHo: paginawijzer

JoHo 'chapter 'pagina

 

Wat vind je op een JoHo 'chapter' pagina?

  •   JoHo chapters zijn tekstblokken en hoofdstukken rond een specifieke vraag of een deelonderwerp

Crossroad: volgen

  • Via een beperkt aantal geselecteerde webpagina's kan je verder reizen op de JoHo website

Crossroad: kiezen

  • Via alle aan het chapter verbonden webpagina's kan je verder lezen in een volgend hoofdstuk of tekstonderdeel.

Footprints: bewaren

  • Je kunt deze pagina bewaren in je persoonlijke lijsten zoals: je eigen paginabundel, je to-do-list, je checklist of bijvoorbeeld je meeneem(pack)lijst. Je vindt jouw persoonlijke  lijsten onderaan vrijwel elke webpagina of op je userpage
  • Dit is een service voor JoHo donateurs en abonnees.

Abonnement: nemen

  • Hier kun je naar de pagina om je aan te sluiten bij JoHo, JoHo te steunen en zelf en volledig gebruik te kunnen maken van alle teksten en tools.

Abonnement: checken

  • Hier vind je wat jouw status is als JoHo donateur of abonnee

Prints: maken

  • Dit is een service voor wie bij JoHo is aangesloten. Wil je een tekst overzichtelijk printen, gebruik dan deze knop.
JoHo: footprint achterlaten