Summaries and study assistance with Analysing Data using Linear Models by Van den Berg – Booktool

  Tools

De hele tekst op deze pagina lezen? Alle JoHo tools gebruiken? Sluit je dan aan bij JoHo en log in!
 

Aansluiten bij JoHo als abonnee of donateur

The world of JoHo footer met landenkaart

Summaries and study assistance with Analysing Data using Linear Models by Van den Berg

Booksummaries

JoHo: crossroads via bundels

Summaries with prescribed literature

Summaries per chapter with the 2022 edition of Analysing Data using Linear Models by Van den Berg - Bundle

Summaries per chapter with the 2022 edition of Analysing Data using Linear Models by Van den Berg - Bundle

Study guide with Analysing Data using Linear Models by Van den Berg

Study guide with Analysing Data using Linear Models by Van den Berg

Study guide with Analysing Data using Linear Models

Oline summaries and study assistance with the 2022 edition of Analysing Data using Linear Models by Van den Berg

Related content on joho.org

What are variables, variation and co-variation? - Chapter 1

What are variables, variation and co-variation? - Chapter 1


What is this chapter about?

In this chapter, different types of variables are discussed. Quartiles, quantiles and percentiles are explained as well. Lastly, a normal distribution is shown as well.

What is a data matrix?

Data (plural) are facts and statistics collected together for reference or analysis. In data analysis, we almost always put data in a matrix format. Usually, the objects of the study -called units- are put in rows, and their properties -called variables- in columns. A data matrix thus is a matrix (a collection of rows and columns) that contains information on units (in the rows) in the form of variables (in the columns). An example of such a data matrix is given below. In this matrix, there are four units and two variables.

namegrade
Laura8
Lisa7
Luna6
Lena9

What is meant by wide and long data formats?

Often, units of analysis are observed on multiple variables, meaning that there are more observations for every unit of analysis. These data can be stored in either a wide or long format. In a wide format, variables are simply add to the row (unit of analysis). Each new observation of the same variable on the same unit of analysis leads to a new column in the data matrix. Below you can find examples of a wide format.

clientdepression_1depression_2depression_3depression_4
111511010095
2105100103102
3106105103103

An alternative way to describe these data is by, instead of adding columns, simply sticking to one variable and only adding rows. This is done by means of a long format. Below is an example of a long format. Note that these are exactly the same data, only visualized differently.

clienttimedepression
11115
12110
13100
1495
21105
22100
23103
24102
31106
32105
33103
33103

Which different types of measurement level exist?

Data analysis is in essence about describing how different values in one variable relate to different values in one or more other variables (co-variation). When describing such co-varying variables, linear models are an important tool. In differentiating between these different variables, one important distinction is the measurement level of the variables: numeric, ordinal or categorical. 

Numeric variables are variables that have values describing a quantity that can be measured as a number, such as ‘how many’ students in a classroom or ‘how much’ kg you weigh. A numeric variable can be a count variable, for instance the number of children in a classroom. A count variable can only consist of discrete, natural, positive numbers: 0, 1, 2, 3, etcetera. But a numeric variable can also be a continuous variable. Continuous variables can take any value from the set of real numbers, such as weight: 60.2, 58.8, 93.2 and so on. The number of decimals can be as large as the instrument of measurement allows. Examples of continuous variables include height, time, age, blood pressure and temperature.

For numeric variables, one can further distinguish between interval variables and ratio variables. The difference between interval and ratio variables is that for ratio variables, the ratio between two measurement values is meaningful, and for interval variables it is not. When a variable has a fixed zero-point, it is a ratio variable. In case the variable has an arbitrary zero-point, it is called an interval variable. What ratio and interval variables do have in common however, is that they are both numeric variables, expressing quantities in terms of units of measurements. This implies that the distance between 10 and 20 is the same as the distances between 30 and 40, 40 and 50 and so forth. This distinguishes them from ordinal variables. 

Ordinal variables are not measures in units. However, they can have a meaningful order in the values of the variable. For example the size of clothing: small, medium, large. Ordinal variables are usually discrete: there is not an infinite number of levels of the variable. In case of our example with sizes small, medium and large, there are no meaningful other values in between these values. Categorical variables do not consist of any order at all. They are about the quality of study objects. 

What are frequency tables?

frequency table describes how often a certain frequency occurs. Below is an example of a frequency table with frequencies, proportions and cumulative proportions (adding to 100).

agefrequencyproportioncum_frequencycum_proportion
050.150.1
1100.2150.3
2100.2250.5
3200.4450.9
450.1501.0

These data can also be plotted in a frequency plot with age on the x-axis and frequency on the y-axis. Further, once could plot these data using a histogram. Histograms contain the same information as frequency plots, except that groups of values are taken together. Such a group of values is called a bin. In our example, each age could be a bin. Therefore, there would have been five bins, each containing a frequency. 

What are quartiles, quantiles and percentiles?

Quartiles (from quarter, a forth) are used to make a division into four groups. For example, you could divide 100 children by assigning the 25% tallest children into the first group, the 25% smallest children in the last group and the remaining 50% divided into two equally sized groups in the middle. Next, a quantile is the value below which a given proportion of observations in a group of observations fall. Finally, percentiles are very much like quantiles, except that they refer to percentages rather than proportions. Thus, the 25th percentile is the same as the 0.25 quantile. And the 0.75 quantile is the same as the 75th percentile.

Which three measures of central tendency are there?

There are three measures of central tendency:

  1. The mean is the average value, which can be computed by adding up all the values and dividing it by the number of values.
  2. The median simply is the middle value. In the event of even numbers, it is the average of the two middle values.
  3. The mode is the value that occurs most. 

For numeric variables, all three measures of central tendency are valuable. For ordinal variables, the mean is not meaningful, but the median and mode are. For categorical variables, only the mode is valuable. 

How to use measures of variation?

Next to summarising distributions by measures of central tendency, we could summarise distributions by measures of variation. First, the range is the distance between the lowest and highest value. Suppose, the lowest value is 100 and the highest value is 130, then the range is: 130 - 100 = 30. Second, the interquartile range (IQR) is the distance between the first and third quartile. That is, the difference between the value for which 75% of measurements is below and the value for which 25% of measurements is below. Third, the sum of squares or sum of squared deviations is the sum of all deviations from the mean. The variance, then, is the sum of squared deviations divided by the number of observations. The standard deviation is often used to indicate how deviant a particular value is from the rest of the values. For example, suppose we have a mean of 100 and a standarddeviation of 5. Then, a score of 105 is one standard deviation separated from the mean. A score of 110 is two standard deviations separated from the mean. 

These standard deviations are useful, because they make it possible to compare different values from different variables. More specifically, a standardised score can be computed by subtracting the mean and dividing the result by the standard deviation. A z-score (also known as a standard scores) gives you an idea of how far from the mean a data point is. In more technical terms, it is a measure of how many standard deviations below or above the population mean a raw score is. Z-scores are a way to compare results to a “normal” population.

What is a normal distribution and how can you define it using the empirical rule?

It is important to know that for a normal distribution (bell-shaped distribution), the mean, median and mode are all the same. Moreover, 68% of all values lie between 1 standard deviation below and above the mean. In addition, we also know that 5% of the observed values lie more than 1.96 standard deviations away from the mean (2.5% on both sides). Because all these percentages are known for the number of standard deviations, it is easier to talk about the standard normal distribution.

Although tables are readily found online, it’s helpful to memorise the so-called 68 – 95 – 99.7 rule, also called the empirical rule. It says that 68% of normally distributed values are at most 1 standard deviation away from the mean, 95% of the values are at most 2 standard deviations away (more precisely, 1.96), and 99.7% of the values are at most 3 standard deviations away. In other words, 68% of standardised values are between -1 and +1, 95% of standardised values are between -2 and +2 (-1.96 and +1.96), and 99.7% of standardised values are between -3 and +3.

How can we make inferences about a mean? - Chapter 2
How can we make inferences about proportions? - Chapter 3
What does linear modelling entail? - Chapter 4
How can we make inferences about linear models? - Chapter 5
What are categorical predictor variables? - Chapter 6
What are the assumptions of linear models? - Chapter 7
What should we do when the assumptions are not met? - Chapter 8
What does moderation entail? - Chapter 9
How do researchers use contrast in statistical analysis? - Chapter 10
How do we perform post hoc comparisons? - Chapter 11
How do we perform linear mixed modelling? - Chapter 12
How do we conduct linear mixed models for more than two measurements? - Chapter 13
What are non-parametric alternatives for linear mixed models? - Chapter 14
How is logistic regression conducted with generalised linear models? - Chapter 15
How can generalised linear models be used for count data? - Chapter 16
What does big data analytics entail? - Chapter 17
Examtickets per chapter with the 2022 edition of Analysing Data using Linear Models by van den Berg - Chapter
Summaries and study assistance with Analysing Data using Linear Models by Van den Berg – Booktool
JoHo nieuwsupdates voor inspiratie, motivatie en nieuwe ervaringen: winter 23/24

Projecten, Studiehulp en tools:

  • Contentietools: wie in deze dagen verwonderd om zich heen kijkt kan wellicht terecht op de pagina's over tolerantie en verdraagzaamheid en over empathie en begrip, mocht dat niet voldoende helpen check dan eens de pagina over het omgaan met stress of neem de vluchtroute via activiteit en avontuur in het buitenland.
  • Competentietools: voor meer werkplezier en energie en voor betere prestaties tijdens studie of werk kan je gebruik maken van de pagina's voor vaardigheden en competenties.
  • Samenvattingen: de studiehulp voor Rechten & Juridische opleidingen is sinds de zomer van 2023 volledig te vinden op JoHo WorldSupporter.org. Voor de studies Pedagogiek en Psychologie kan je ook in 2024 nog op JoHo.org terecht.
  • Projecten: sinds het begin van 2023 is Bless the Children, samen met JoHo, weer begonnen om de slum tours nieuw leven in te blazen na de langdurige coronastop. Inmiddels draaien de sloppentours weer volop en worden er weer nieuwe tourmoeders uit deze sloppen opgeleid om de tours te gaan leiden. In het najaar van 2023 is ook een aantal grote dozen met JoHo reiskringloop materialen naar de Filipijnen verscheept. Bless the Children heeft daarmee in het net geopende kantoortje in Baseco, waar de sloppentour eindigt, een weggeef- en kringloopwinkel geopend.

Vacatures, Verzekeringe en vertrek naar buitenland:

World of JoHo:

  • Leiden: de verbouwing van het Leidse JoHo pand loopt lichte vertraging op, maar nadert het einde. Naar verwachting zullen eind februari de deuren weer geopend kunnen worden.
  • Den Haag: aangezien het monumentale JoHo pand in Den Haag door de gemeente noodgedwongen wordt afgebroken en herbouwd, zal JoHo gedurende die periode gehuisvest zijn in de Leidse vestiging.
  • Medewerkers: met name op het gebied van studiehulpcoördinatie, internationale samenwerking en internationale verzekeringen wordt nog gezocht naar versterking!

Nieuws en jaaroverzicht 2023 -2024

  

Alternatives & Study assistance

Examtickets per chapter with the 2022 edition of Analysing Data using Linear Models by van den Berg - Chapter

  

   

    

   

Summaries and study assistance per related study programme

  

 

JoHo: crossroads uit de selectie
JoHo: crossroads uit de bundels