Summaries with Multivariate Data Analysis (LU) - Coursetool

  Tools

De hele tekst op deze pagina lezen? Alle JoHo tools gebruiken? Sluit je dan aan bij JoHo en log in!
 

Aansluiten bij JoHo als abonnee of donateur

The world of JoHo footer met landenkaart

Coursetool for JoHo subscribers

 

Course: Multivariate Data Analysis - Leiden University

Studytools: Summaries per chapter - ExamTests per chapter

Mededelingen en laatste stand van tool, wijzer, vak of boek

Study Tools

Published

  • Literature summary of Multivariate Data Analysis Text Book by Leiden University

 

Test Tools

Published

  • ExamTests of Multivariate Data Analysis Text Book by Leiden University

 

Tools in Print

Published

  • Literature summary & ExamTests of Multivariate Data Analysis Text Book by Leiden University

 

Booksummary per chapter

Summaries per chapter with Multivariate Data Analysis Text Book by Leiden University - Bundle

Summaries per chapter with Multivariate Data Analysis Text Book by Leiden University - Bundle

Which analysis method can be used for separate types of problems? - Chapter 0
How does multiple regression analysis (MRA) work? - Chapter 1

How does multiple regression analysis (MRA) work? - Chapter 1

Pearson correlations represent the relationship between two variables, in the form of a 'sign' (positive or negative) and the strength of the relationship. A positive correlation means that when one variable increases, the other does too. Negative means that the values of the variables ​​change in the opposite direction. However, if we have two or more interval variables and want to predict one of these variables (the dependent variable Y) from the other variables (the independent variables X1 to Xk), we can use regression analysis. In that case we need a formula for the most optimal prediction.

Why use regression?

The main difference between regression analysis and Pearson correlations is the asymmetry: predicting Y using X yields different results than predicting X using Y. A second reason to use regression is to be able to check whether the observed values ​​are consistent with a causal explanation by the investigator. Using regression, however, says nothing about the correctness of such causal models.

What is the essence of simple regression analysis?

The general formula for a simple regression is Y = b0 + b1X + e, where Y stands for the dependent variable and X for the independent variable. The parameters to be estimated are called the intercept (b0) and the regression weight (b1). The error (e) is the difference between the estimated and actual value of Y. The relationship between X and Y can also be graphed. The most commonly used method to make an optimal prediction is the least squares method. With these methods, the parameters are chosen in such a way that the sum of the squared predicted errors is as small as possible.

What does 'regression to the mean' refer to?

The raw scores of X and Y can be converted into (standardized) z-scores. These scores always lie between -1 and +1, and the mean is always 0. In that case, there may be regression to the mean: the predicted value of Y is always closer to the mean than the corresponding value of X Regression to the mean is an important property of numerical series from variables that do not have a perfect linear relationship with each other. Substantial statements about reality derived from regression do not necessarily have to be correct. In empirical reality, there may be phenomena that bring the values ​​closer to or further from the mean. Statisticians are not concerned with this type of empirical science.

What is a multiple regression analysis?

Predicting and explaining (causal) relationships can also be important if there are more than two variables. The use of multiple regression has three advantages in this area over the use of Pearson correlations.

First, it provides us with information about the optimal prediction of Y using a combination of X variables. In addition, we can determine how good our prediction is by looking at the total contribution of the set of predictors to the prediction. Finally, we can determine how good each individual predictor is, so what the contribution of each predictor is to the prediction. It is important to note that the most optimal prediction does not necessarily have to be a correct prediction. The latter advantage can be used to more clearly establish a causal relationship or to see whether adding a predictor has added value.

What are multiple correlations?

The multiple correlation (R) always has a value that lies between 0 and 1, so it cannot be negative. R2 refers to the proportionally explained variance of Y, with a higher R2 indicating a better prediction. Adjusted R2 can be used to correct for an overestimate of the shared variance . Thus, the predictors can have shared and unique variance. This unique variance can be represented by squared semi-partial correlations. Sometimes there is suppression, where the unique contribution of a variable after correction for another variable is greater than the contribution without correction. In other words, the real effect of X1 on Y was suppressed by the relationships of X1 and Y with X2,

How are constant and regression weights obtained?

The constant generally has no intrinsic value for psychologists and is therefore difficult to interpret. The interpretation of the regression weights can also be problematic, because the units of measurement are often arbitrary. This also makes it difficult to determine which predictor is most important. The latter problem can be solved by using standardized regression weights. In this way you are independent of units of measurement and you can compare different predictors well. However, this has the negative consequence that you become dependent on the standard deviation within samples, which is particularly problematic if you want to compare different studies. Regression weights are always partial, which means that they are only valid as long as all variables have been included in the equation, i.e. when correcting for the effects of all other variables.

How is testing done from samples to populations?

So far we have only looked at descriptive statistics. However, we can also use inferential statistics to make statements about the population from which the sample originates. An F-test can be used to determine whether the total contribution of all variables differs from zero. To determine the unique contribution of each predictor, a t-test can be performed for each predictor. The more predictors, the greater the chance on Type I errors.

Therefore, the general F-test is used as a kind of gatekeeper to determine whether the t-tests should be considered.

Which assumptions exist?

There are several assumptions that must be met:

  1. The dependent variable must be of interval level; predictors can be binary or interval level.

    • Satisfying this assumption is virtually impossible, but important for correct interpretation. Most real-life studies use quasi-interval variables. Fortunately, multiple regression is generally quite robust for small deviations from the interval level.

  2. There is a linear relationship between the predictors (Xis) and the dependent variable.

    • With standard multiple regression, only linear relationships can be found (and for example no curvi-linear relationships). Deviations can be determined with a residual plot.

  3. The residuals have (a) a normal distribution, (b) the same variance for all values ​​of the predictor linear combinations and (c) are independent of each other.

The assumption of normally distributed residuals (3a) is not very important to consider, because regression tests are robust against violation if the sample is large enough (N> 100). Usually, this assumption is checked by visually inspecting a histogram. The assumption of heteroscedasticity (3b) must be checked, because regression is not robust against violation of this assumption. A residual plot can be used for this. The last assumption (independence from errors, 3c) is very important, but difficult to check. Fortunately, this assumption is often met by the research design. Checking assumptions always depends on the judgment of researchers and can therefore be interpreted differently by everyone.

What is meant by multicollinearity and outliers?

Outliers are scores of three or more standard deviations above or below the mean. It is important to consider why an individual's score is an outlier in the analysis. In addition, outliers can have a disproportionate influence on regression weights. If you decide to remove outliers from the analysis, it is good to be clear about this in the report and to indicate explicitly why you have chosen to do this.

Multicollinearity

Several problems can arise if correlations between predictive variables are too strong. Sometimes the regression analysis yields no results at all. In other cases, the estimates are unreliable or it is difficult to interpret the results. To check for multicollinearity, we should look at the tolerance of each predictor (<.10) and the related VIF (<10). There are two strategies to combat multicollinearity. Overlap between variables can be attributed to an underlying construct or latent variable. In that case, the variables can be merged into a single variable. Another strategy is based on the idea that there may be hierarchy, which means that some of the predictive variables are the cause of one of the other predictive variables. It can then be determined which predictors are most important; empirically, theoretically and / or statistically.

What are step-by-step methods for regression?

The empirical method is also called stepwise regression and has two variants: forward (in which new predictors are added until the p- value of a predictor is less than .05) and backward (in which all predictors are included in the analysis, after which the non-significant predictors are removed one by one). One problem is that these methods do not always yield the same results. In addition, certain variables may or may not become significant in certain steps because the same variables are not used in each step. Step-by-step approaches are very sensitive to probability influences. In addition, it is better to choose variables based on substantial (theoretical) considerations. However, the above procedures can still be used if (1) prediction is the goal, (2) the number of predictors is small compared to the number of people, and (3) cross-validation with other samples yields similar results.

When is regression hierarchical?

Sometimes it is better not to do a single multiple regression, but a successive series of regressions. This method can be used, for example, if variables only become important when other variables have been checked (regarding curvi-linear relationships, interactions and missing data). In addition, this method is useful for testing different causal blocks.

How to apply regression analysis in SPSS?

When you perform a regression analysis in SPSS, you start by checking outliers and assumptions. Then you interpret the multiple correlation and related aspects. Finally, you interpret the regression weights.

How does Analysis of Variance (ANOVA) work? - Chapter 2
How does Analysis of Covariance (ANCOVA) work? - Chapter 3
How does MANOVA work? - Chapter 4
How does repeated ANOVA measurements work? - Chapter 5
How does Logistic Regression Analysis (LRA) work? - Chapter 6
How does mediation analysis work? - Chapter 7
What is meant by suppression and what are spurious correlations? - Chapter 8
JoHo nieuwsupdates voor inspiratie, motivatie en nieuwe ervaringen: winter 23/24

Projecten, Studiehulp en tools:

  • Contentietools: wie in deze dagen verwonderd om zich heen kijkt kan wellicht terecht op de pagina's over tolerantie en verdraagzaamheid en over empathie en begrip, mocht dat niet voldoende helpen check dan eens de pagina over het omgaan met stress of neem de vluchtroute via activiteit en avontuur in het buitenland.
  • Competentietools: voor meer werkplezier en energie en voor betere prestaties tijdens studie of werk kan je gebruik maken van de pagina's voor vaardigheden en competenties.
  • Samenvattingen: de studiehulp voor Rechten & Juridische opleidingen is sinds de zomer van 2023 volledig te vinden op JoHo WorldSupporter.org. Voor de studies Pedagogiek en Psychologie kan je ook in 2024 nog op JoHo.org terecht.
  • Projecten: sinds het begin van 2023 is Bless the Children, samen met JoHo, weer begonnen om de slum tours nieuw leven in te blazen na de langdurige coronastop. Inmiddels draaien de sloppentours weer volop en worden er weer nieuwe tourmoeders uit deze sloppen opgeleid om de tours te gaan leiden. In het najaar van 2023 is ook een aantal grote dozen met JoHo reiskringloop materialen naar de Filipijnen verscheept. Bless the Children heeft daarmee in het net geopende kantoortje in Baseco, waar de sloppentour eindigt, een weggeef- en kringloopwinkel geopend.

Vacatures, Verzekeringe en vertrek naar buitenland:

World of JoHo:

  • Leiden: de verbouwing van het Leidse JoHo pand loopt lichte vertraging op, maar nadert het einde. Naar verwachting zullen eind februari de deuren weer geopend kunnen worden.
  • Den Haag: aangezien het monumentale JoHo pand in Den Haag door de gemeente noodgedwongen wordt afgebroken en herbouwd, zal JoHo gedurende die periode gehuisvest zijn in de Leidse vestiging.
  • Medewerkers: met name op het gebied van studiehulpcoördinatie, internationale samenwerking en internationale verzekeringen wordt nog gezocht naar versterking!

Nieuws en jaaroverzicht 2023 -2024

  

  

Exams and tests

   

    

   

Webshop for printversions

Summaries and study assistance per related study programme