Summary with Research Methods in Psychology: Evaluating a World of Information - Morling - 3rd edition


About Research Methods in Psychology 4th edition by Morling - Chapter

The book Research Methods in Psychology is divided into six parts. In the first part, an introduction is given on how we can develop and use scientific reasoning. It examines what research producers and consumers are, how research works, which sources of research there are and what constitutes reliability and validity. Part two of the book is about fundamental and foundational things in research, such as ethical guidelines and how to measure something properly. The third and part of the book provides insight into how we can evaluate frequency claims and discusses surveys, interviews, observational research and how to properly extract sample from the population. Part four adds onto this and talks about the evaluation of association claims based on bivariate correlational research and multivariate correlational research. Part five reviews the evaluation of causal claims based on experiments and the different research designs we can adopt when we set up an experiment. Finally, part six investigates how we can balance the different interests within research and looks at quasi experimentation, replication and how to communicate research results in the right way.

Research Methods in Psychology is written by Beth Morling. She is a professor at the University of Delaware and her expertise is research methods within psychology. She wrote the book to ensure that all students learn to look critically at research, as she believes this is an important skill to have for any job. This fourth edition of the book has additional information compared to the third edition. For example, pieces of text have been added about 'Merton's standards of science', why we can't prove theories, self-plagiarism and the influence of sample size on how precise an estimate is. In addition, almost all the example studies from the third edition have been replaced by newer examples, to keep up with the changes within the field of psychological research.

What is the psychological way of thinking? - Chapter 1

Psychology is based on research and studies. Psychologists can be seen as scientists and empiricists. Empiricists base their conclusions on systematic observations. Psychologists base their ideas about behaviour on studies they have conducted with animals or people in their natural habitat or in an artificial setting. If you want to think as a psychologist, you must think as a researcher.

Who are producers and consumers in research?

Psychology students who are interested in conducting research, looking at questionnaires, researching animals, researching the brain or other subjects from psychology, are called producers in research. These students will probably publish papers and work as a researcher or professor. Of course, there are psychology students who do not want to work in a laboratory, but who like reading about research with animals and humans. These students are consumers in research. They read about studies and apply the things they have read in their work field. These students can become therapists, advisors, counsellors or teachers. In practice, many psychologists have both roles. They are producers and consumers in research.

As a psychology student, it is important to know how to be a good producer of research. Even if you do not plan to do a PhD after your graduation, you will have to write a Bachelor Thesis and Master Thesis and these have to comply to the APA standards. These APA standards mostly concern how to reference a certain article in your thesis. In your thesis, you will have to reference the author(s) and the year of publication of the article. In your reference list, you have to write down the name(s) of the author(s), followed by the year of publication of the article, the title of the article, the title of the journal in which it was published, the volume of the journal and lastly, the page numbers. According to APA standards, you must use Times New Roman as font and the size of the letters should be 12, spacing needs to be 2,0. You will probably also have to follow courses in which you have to conduct research. It is therefore important to know how to randomly allocate participants to conditions and how to read graphs.

However, most psychology students do not become researchers. Therefore, it is important to be a good consumer of research. You will need to read about research, understand research, learn from it and ask good questions about it. Most information psychologists look up on the internet, is based on research. There are also many papers and popular magazines that write about research. However, only part of the studies that is conducted is accurate and useful. It is important to know how to distinguish good studies from bad. Knowledge about research methods helps. Therapists need to interpret published studies in a good way, in order to stay on track of new techniques and effective therapies. They need to follow evidence-based treatments. Therapists need to use treatments that have been supported by research.

How do scientists approach their job?

Scientists are empiricists observing the world systematically. They also test their theories with studies and change their theories in accordance to found data. The scientists approach applies research (problems from everyday life) and basic research (meant to contribute to our overall knowledge) in an empirical fashion. Scientists go further and further. They do not just stop after they have found one effect. When a scientist finds an effect, he/she wants to conduct another research to find out why, when and for whom the effect applies. Also, scientists publish their findings in scientific journals and share their findings with the media.

How do empiricists approach their job?

Empiricists do not base their conclusions on intuition, experiences or observations. They base their conclusions on their senses or instruments that help these senses (like questionnaires, pictures of a thermometer). They want to be systematic and for their work to be independently verifiable by other researchers.

What is theory-data circle?

The theory-data circle means that scientists collect data to test, change and update their theories. For example, when babies learn to crawl, they often follow their mother. Baby monkeys also seem very attached to their mothers and often hold on to the mother’s fur. Psychologists want to know why animals are so attached to their mothers. One of the theories is the so-called cupboard theory. This theory proposes that mothers are important for babies, because they are a source of food for babies. The babies receive food from their mothers and they feel happy about it. After a while, the babies will get a happy feeling by just seeing the mother. An alternative theory proposes that babies are attached to their mothers, because it gives them a feeling of comfort. This is called the comfort contact theory. Harlow tested both theories in a lab. He built two monkeys out of mesh wire. One of the mother monkeys had only the mesh and a bottle of milk, so she provided food but no comfort. The other mother monkey was covered with a warm cloth and so she provided comfort, but no food. Harlow let the baby monkeys spend time with the fake mother monkeys and he looked at how long the babies spend with each mother. The baby monkeys spent much more time with the warm mother than the mother who gave food. This suggests that the comfort contact theory is correct.

What are theories, hypotheses and data?

A theory has claims that concern the relationship between variables. Theories lead to specific hypotheses. A hypothesis can be seen as a prediction. It says something about what the researchers expect to observe if their theory is correct. One single theory can have many hypotheses. Data is a set of observations. Data can support a theory or undermine it.

What are the traits of good scientific theories?

The best theories are supported by data, they are falsifiable and parsimonious. Good theories need to be supported by data. They also have to be falsifiable. That means that theories can lead to hypotheses that, when tested, do not support the theory. Also, a theory needs to be as simple as possible. If two theories explain the data equally well, but one theory is more simple than the other, then one has to choose the simple theory. It is also important to take into account that theories do not prove anything. You are allowed to say that the data supports the theory or that the data is consistent with the theory, but you are not allowed to say that a finding proves that a theory is correct.

What is the difference between applied and basic research?

Applied research concerns practical problems. Scientists hope that their findings will be directly applied to solve real-life problems. Basic research is conducted to enhance overall knowledge about a certain topic. An example of this is researching the motivation of depressed people. It is often the case that basic research is used to conduct applied research later on. Translational research is the use of knowledge from basic research to test and develop applications for psychotherapy, health care and other forms of treatment. In fact, translational research can be seen as a bridge between basic and applied research.

Do researchers go further?

Every research leads to new questions. A research can find a simple effect, but the researcher probably also wants to know why this effect occurs, when this effect occurs and what the boundary conditions for the occurrence are. This means that the researcher needs to develop a new study in order to test these questions.

How is scientific work published?

Scientists publish their research in scientific journals. These journals usually come out once a month. The article will only be published after it has been reviewed by experts. When you send your article to a journal, the editor will send your article to three or four experts in that field. These experts will tell the editor about the good and bad parts of the article. They can also share how interesting the finding is and whether that question has been researched previously. The editor has to decide whether the article will be published or not. This is a rigorous process. The experts stay anonymous and this guarantees them to give their honest opinions about the article. It is their task to publish interesting and well-conducted studies. After the article has been published, other scientists can send in their comments if they do not agree with something from the article. Scientists can also cite each other’s work and do further research on that topic.

How does scientific work end up in newspapers?

Articles in scientific journals are read by other scientists or students. The ‘common folk’ does not read these articles. Popular magazines are not written by experts or scientists. But nowadays, many magazines have some sections about scientific research. These articles are written in a more understandable way and are also shorter than the original scientific article. Psychologists do profit from their work being published in a normal magazine. Normal people can read about what it is that psychologists do and they can learn more about a certain subject. However, journalists do not always choose the most important story, but the sensational one. Also, not all journalists understand a scientific article fully. They have not been trained to read and understand these articles. An example is an article that had been published about the happiness of people from different cities in England. In a scientific article it was written that people in Edinburgh were the least happy, but that this finding was not statistically significant. Journalists wrote articles about this research and they did not understand anything about statistical significance. Many magazine articles have been published about people in Edinburgh being miserable, while this finding was not even significant. The researcher tried to explain to people that the finding was not significant and he hoped that the journalists would set the story straight. Unfortunately, the journalists did not want to hear anything about the statistical significance.

What are sources of information in psychological research? - Chapter 2

Research or own experience?

When people make decisions, they often rely on their own experiences. If you have not had good experience with a particular car brand, you will probably not buy that car again. People often also rely on the experiences of relatives. Why should not you trust your own experience or the experience of someone you know?

Why is it important to have comparison groups?

There are many reasons why beliefs should not be based on experiences. One of those reasons is that experiences do not have a comparison group. With a comparison group, one can look at what happens with and without the variable that is being studied. In order to draw conclusions about a particular effect, you need to compare groups with each other. One must look at the treated/improved group, the treated/unimproved group, the untreated/improved group and the untreated/unimproved group. With these groups, you can come to an estimate of the effect of the treatment by looking at the treated/improved group versus the no treatment group. When you only look at your experiences, you have no comparison group. You only look at one person and that is yourself. Only research provides us with a systematic comparison.

Why is experience confounded?

Many things happen simultaneously in our everyday life and therefore, it is quite hard to draw conclusions based on experiences. If a change takes place, you will not know for certain what has caused the change. In our daily lives, there are many explanations for an outcome. In research these alternative explanations are called confounds. A confound has occurred when you think that one variable has caused a change, but other things have changed as well and you can not for sure state what the cause of that change was. It is difficult to isolate the variables in everyday life. In research it is possible to control variables and change one variable at a time.

Why is research better than experience?

Hypotheses can be tested by using controlled and systematic comparisons. Studies can use confederate. That is a person who is working with the researcher, pretends to be a participant. In a controlled study, researchers can make conditions in which there is at least one comparison group.

What is problematic about research?

Research is usually more accurate than individual experiences, but sometimes our experiences contradict the findings of a research. Personal experiences are strong and people ascribe a lot of meaning to their personal experiences. Sometimes your own experiences can be an exception to found effects of research. Does this experience show that the findings were incorrect? No, it does not, because research is probabilistic. That means that a researcher does not expect his/her findings to explain all cases all the time. The conclusions from a study explain a part of all possible cases. Research can predict that there is a high chance of something happening, but that does not mean that it will always happen.

Research or intuition?

People also often base their conclusions on intuition. We often think that our intuition is reliable, but our intuition can lead to less effective decisions. That is because most people are not scientific thinkers and therefore biased (not that scientists can not be biased). A bias can be either cognitive or motivational.

How is intuition distorted?

Usually, our intuitions are distorted because our brain does not work perfectly. People are sometimes easily persuaded by a story that sounds plausible, but in fact is not. One cognitive bias is that we accept a story because it sounds logical. Another example of a cognitive bias is the availability heuristic. This means that things that come up to mind easily, steer our thoughts. Usually, these are thoughts about things that happened recently or vivid thoughts. Some subjects get more attention from the media and because of this, we might think that these things occur more often. An example of this is the number of plane versus car accidents. In the news, there are more stories about plane crashes than car crashes and therefore, one might conclude that more people die on a yearly base from plane crashes than car crashes. Availability heuristics can result in overestimating things. Another problem is that people often do not look for negative information. We usually look at the things that are present, but not at the things that are not - present/present bias. This bias enables us to think more about a case in which the treatment and outcome were present, than about a case in which the treatment was not present, but the outcome was.

How can motivation distort intuition?

Sometimes people do not want to change their ideas. Because they do not want to change their beliefs, they may only look at information that coincides with their beliefs. Sometimes we can steer our thinking by asking questions that result in answers that fit our way of thinking. This is called confirmatory hypothesis testing. This is not a scientific way of conducting research. Questions are asked that confirm a hypothesis, but no questions are asked that disconfirm the hypothesis. People are also biased about being biased. Even if people know of the existence of biases, they often think that they are not biased. Thinking that you are not victim to a bias is called the bias blind spot. This bias can result in one believing that he/she is correct, but believing in something is not a scientific way of thinking.

Can we trust authority figures?

Before you take the advice of an authority figure, you must ask yourself where his/her ideas are coming from. Has this person compared the different conditions in an objective and systematic way? If this person refers to scientific research, you can be more sure that he/she is correct. Remember that authority figures can also base their conclusions on intuitions and experiences. Also remember that not all studies have been conducted in a correct manner.

Where can we find and read about research?

Where and how can we find articles about scientific research?

Your conclusions should be based on research, but where do you find these scientific articles? Most psychologists publish their work in three different sources. Usually, the work is published in scientific journals. Psychologists can also publish their work in a chapter from edited book. Scientists can also write a whole book about their research.

Most scientific journals are published once a month or once in three months. You can find these journals in the university library or online. The articles in these journals are empirical articles or review articles. Empirical articles rapport for the first time about the results of a study. These articles discuss methods, statistical tests and results of a study. A review article gives a summary of many published studies on a particular subject. Sometimes a review article uses a meta-analysis. This analysis combines the results of multiple studies into one measure of effect-size. Scientists appreciate meta-analyses because these analyses weigh each study proportionally. Before empirical and review articles can be published, they must be reviewed by experts. These journals are read by other scientists and students.

A so-called edited book exists of multiple chapters about the same subject, but the chapters have been written by different authors. One of the editors invites other scientists to write one or multiple chapters for the book. These books often summarize the work about a particular subject. The books are not judged that strictly as a scientific article, but the editor only invites scientists who know a lot about that subject to write a chapter. These books are also read by other psychologists and students. Psychologists can also publish their work in an entire book. However, that does not happen often.

One of the most used databases for psychological articles is PsycINFO. PsycINFO is updated weekly by APA. In PsycINFO you can find articles about a certain subject, but you can also look for all the articles of an author. PsycINFO also shows how often an article has been cited and by which articles. An alternative to PsycINFO is Google Scholar. Not all articles in Google Scholar are free and Google Scholar is not so well organized as PsycINFO.

Where and how can you read about research?

Some students have trouble reading scientific articles. Especially at the beginning of their study, it is a challenge. Most scientific articles are written in the same way. Most articles have the same parts, reported in the same order: abstract, introduction, methods, results, discussion and references. One advice is not to read every single word. You set out to read the article with a goal. You want to know what the main argument is and what the evidence for or against this argument is. It is therefore important to first read the abstract. At the end of the introduction, you can find the hypotheses. You can also find the hypotheses and the results at the beginning of the discussion (summarized). After you have read about the hypotheses, you can start with the introduction. In chapters from books and in review articles, there are no specific sections like in empirical articles. Still, you need to ask yourself what the argument is and what the evidence for or against the argument is.

The best thing is to read about research in scientific articles. However, the work of psychologists is also mentioned in non-scientific journals. A bookstore often has a psychology section. You can find books that have been written for people who have not studied psychology. These books are written to help people, to entertain people and to earn money. The language in these books is easier than in scientific articles. In order to find out if these books base their story on scientific articles, one must look at the notes (usually at the end of the book). In the notes one can find on which articles the work is based. Books that do not have references, should not be taken seriously.

Wikipedia can be a source of information, but it is not always reliable. Some psychological phenomena have their own Wikipedia page, but that does not mean that everything that is stated there is reliable. Everyone can edit a Wikipedia page and use sources he or she wants. Sources are often stated, but these sources are the ones which a person decides to use. People who write Wikipedia pages are often enthusiastic, but not always experts on that particular subject.

When people make decisions, they often rely on their own experiences. If you have not had good experience with a particular car brand, you will probably not buy that car again. People often also rely on the experiences of relatives. Why should not you trust your own experience or the experience of someone you know?

What are the interrogation tools for consumers? - Chapter 3

What are variables?

Variables are an important part of research. A variable is something that can vary, so it needs to have at least two levels. A constant is something that can vary, but has one level in a study. In research, every variable is measured or manipulated. A measured variable is a variable in which the measures are observed. Examples are IQ, sex and blood pressure. In order to measure abstract variables (depression and stress), researchers need to use questionnaires. A manipulated variable is a variable a researcher manipulates. This is done by assigning participants to different conditions of a variable. Some variables, like gender, can only be measured and not manipulated. Some variables are not allowed to be manipulated, because it would be unethical. People are not allowed to be assigned to conditions in which they can experience a great amount of emotional pain. Other variables can be measured and manipulated.

Every variable can be described in two ways. Conceptual variables are abstract concepts: an example of this is intelligence. These variables are also called constructs. To these variables a definition has to be ascribed carefully. These definitions are called conceptual definitions. To test hypotheses, researchers need to create operational definitions of variables. To operationalize, means that a concept is changed to a measurable variable or a variable that can be manipulated. A conceptual variable of shyness can be operationalized to a structured set of questions. Sometimes it is difficult to operationalize concepts.

What are the three psychological claims?

A claim is an argument someone makes. Psychologists make claims based on research. There are three different claims: frequency, association, and causal claims.

Frequency claims describe the frequency of a variable. This is expressed in a number (or percentages). These claims state something about how often something occurs. Frequency claims are always about one variable. These variables are always measured and never manipulated. Association claims state something about the level of one variable being associated with the level of another variable. Variables that are associated are said to correlate. Association claims are about two variables and the variables are measured, not manipulated. There are three different types of association: no association, a positive association and a negative association. A positive association means that a high level of one variable goes together with a high level of the other variable. Also, a low level of one variable goes together with a low level of the other variable. This is called a positive correlation. A negative association means that a high level of one variable goes together with a low level of another variable. No association means there is no correlation between the two variables. A correlation can be depicted in a scatterplot. If a line goes up - a positive correlation can be found, if a line goes down that is negative correlation and a horizontal line shows there is no correlation.

Associations can help us make predictions. These predictions are mathematical, not predictions about the future. The predictions can be used to make estimates more accurate. The stronger the relation between the variables (the closer the correlation gets to 1), the more accurate the predictions will be.

Causal claims state that one variable causes the other. Causal claims always begin with association, but they go further. These claims often use the words ‘cause’ and ‘highten.’ To go from association to causality, a study must meet three criteria. It first needs to show that the two variables correlate. Then it needs to show that the causal variable precedes the other variable. Lastly, it needs to establish that there is no other explanation for the found relationship (so, that there is no third variable that has an influence on the relationship of the two variables). Unfortunately, not all claims that are published in magazines are based on research.

What are the four validities and how are they used?

Consumers of research need to evaluate claims using different validity criteria. Validity refers to the correctness of a conclusion. When a claim is valid, it is said to be accurate. Psychologists do not just state if a claim is valid or not, they look at the different types of validities and report that.

How are frequency claims evaluated?

In order to evaluate frequency claims, one must look at construct validity and external validity. You can also look at statistical validity. Construct validity looks at how well the concept variables have been operationalized. One needs to look at how well the researchers have measured their variables. Construct validity means how well a study has manipulated or measured a variable. The different levels of a variable need to coincide with real differences.

External validity is about generalizability. Which participants have been used and how well do these participants represent the population? When you want to say something about Dutch people, you have to look at different people of the population. You can not just look at the middle class, but you should also look at the lower and upper class. Statistical validity looks at whether the statistical conclusions are accurate. In frequency claims, the statistical validity often says something about a margin of error. If there is a 3% margin of error and a study claims that 26% of the population is unhappy, then the actual percentage is between 23-29%.

How are association claims evaluated?

In association claims, one also looks at construct and external validity. Association claims are about two variables, so you should also look at the validity of these two variables. If the construct validity of one variable is not good, then you should not base your conclusions on this variable. You can also look at statistical validity. One of the aspects of statistical validity is the strength of the relationship. One must look at how strong the association between these variables is. The association between someone’s height and shoe size is strong, but the association between someone’s hair colour and income is weak. One must also look at the statistical significance between associations. Some reported associations have come about by chance. You must also be aware of two types of errors that can arise with statistical validity. A research may, based on data, conclude that there is association between variables, while there is in fact no association between these variables. This is called false error or Type I error. A research can also conclude that there is no association between two variables, while in fact there is one. This is called a miss or Type II error. You have to be trained in order to find these two errors. During statistical courses, you will be trained to spot these errors.

How are causal claims evaluated?

In order to evaluate causal claims, one must look at the three criteria for causation: covariance, temporal precedence and internal validity. Covariance means that there is association between two variables. Temporal precedence means that one variable occurs before the other variable. The variable that, according to the research, influences the other variable, should take place first. Internal validity looks at the influence of another (third) variable on the relationship between the two studied variables. Covariance is about the study's results, while temporal precedence and internal validity are determined by the study's method. If you want your research to be internally valid, you must control for other variables. To test causal claims, researchers conduct experiments. The manipulated variable is called independent variable and the measured variable is called dependent variable. To manipulate a variable means to assign some of your participants into one condition and other into another condition. This needs to be done randomly, otherwise you will not be able to control for third variables.

Construct, external and statistical validities should also be inspected for causal claims. You need to know whether the variables have been manipulated appropriately. You also want to know whether the results can be generalized to another population or setting. You also want to know how strong the relationship between these two variables is.

Which of the four validities is most important? That depends on the situation. All validities are important, but a study can not be perfect. Most researchers have a hard time to take all four validities into account. They have to choose between a couple of these validities and their choice should be based on the goals of the research. If you want to conduct a study by calling people in the Netherlands and you want your results to be generalized to the Dutch population, you must call people from all twelve provinces. That study needs to look at external validity.

Variables are important part of a research. A variable is something that can vary, so it needs to have at least two levels. A constant is something that can vary, but has one level in a study. In research, every variable is measured or manipulated. A measured variable is a variable in which the measures are observed. Examples are IQ, sex and blood pressure. In order to measure abstract variables (depression and stress), researchers need to use questionnaires. A manipulated variable is a variable a researcher manipulates. This is done by assigning participants to different conditions of a variable. Some variables, like gender, can only be measured and not manipulated. Some variables are not allowed to be manipulated, because it would be unethical. People are not allowed to be assigned to conditions in which they can experience a great amount of emotional pain. Other variables can be measured and manipulated.

What are the ethical guidelines for psychological research? - Chapter 4

Nowadays, psychologists need to consider ethical guidelines when conducting studies with human or animal participants. Back in the days, psychologists used to have other ideas about their ethical interactions with participants.

What are some ethical violations from the past?

Tuskegee syphilis research

At the end of the 1920s, many people from the south of the United States were concerned that approximately 35% of black men from the south had syphilis. At that time, the disease was not curable and because of the disease, many people could not work normally or contribute to society. The only treatment was an infuse with a toxic metal. When this treatment worked, it had serious or even fatal side-effects.

In 1932, the U.S. Public Health Service (PHS) decided to work together with the Tuskegee Institute and they conducted a study in which 600 black men participated. 400 of these men had syphilis and 200 were healthy. Scientists wanted to examine the effects of untreated syphilis in the long term. Most participants were enthusiastic about the research, because they thought they would receive free health care. They were not told that the study was actually about syphilis. The study lasted 40 years and researchers intended to follow the men who had syphilis until they died.

So, the men were not told they had syphilis, but ‘bad blood.’ They were also told that they would be treated and that they had to come to the institute in order to be evaluated and tested. The men were never treated for their disease and sometimes the researchers even conducted dangerous procedures. To ensure that all participants would come to the institute, the researcher lied and told the men they had to come to get a special free treatment. 250 men from the study wanted to join the army during WO II. These men were tested by the army and after they had been diagnosed with syphilis, they were told that they should get treatment and enlist again after they have recovered. Researchers did not listen to the people from the army and did not treat the men for their disease. These men were not allowed to join the army and therefore they did not get subsequent benefits one receives when joining the army.

In 1943, the PHS said the penicillin was allowed to be used as a treatment method against syphilis, but the researchers of the study did not tell anything about this new treatment to the participants. Only in 1972, when someone had complained to the media about the research, did the researchers stop the study. Many men become ill during the research and some even passed away. Some men infected their wives and children.

Nowadays we call the decisions the researchers made at that time unethical. These choices can be categorized according to three categories. First, the participants were not treated respectfully. The researchers had lied and withheld information. Because of that, the participants could not actually agree to participate in the study. If they had known beforehand everything about the study, they might have chosen to not participate. Secondly, participants were mistreated. They were not told there was a new cure for the disease and they were exposed to painful tests. Lastly, the participants were part of a disadvantaged group. Everyone can get syphilis, but the researchers only wanted to use black participants in their study.

What were the problems of the Milgram study?

In the 1960s, Milgram conducted a series of studies about obedience to authority. In his study, one participant was the teacher and the other was supposedly the pupil. The teacher had to give electrical shock to the pupil when the pupil answered a question incorrectly. The teacher could not see the pupil. The shocks became higher with every incorrect answer. At a certain point, the pupil starts screaming that he is in lot of pain and wants to stop with the study. Then, he does not respond anymore. The researcher, in his white lab coat, said to the teachers that they had to go on with giving the electric shocks. Participants that protested were told calmly by the researcher to proceed with giving the electric shocks. The results showed that 65% of the participants listened to the researcher and gave fatal shocks to a person (supposedly) when they were told to do so. Of course, the pupil did not actually receive the electric shocks, but the teacher participants had to think that he did.

The first problem is that being the teacher resulted in too much stress. The second is the lasting effects of the study. After the study, participants were told that everything was fake and that the idea behind the study was to see whether people were obedient to authority. Some participants were really devastated that they had hurt another person. Some researchers thought that Milgram should have intervened after seeing the teacher participants distressed. Other researchers think that we have learned much about obedience thanks to Milgram study. It is therefore sometimes difficult to decide whether a study is unethical or not. Often, we have to weigh the potential benefits of the study with the potential risks.

What are the most important ethical principles?

After WO II, certain agreements about ethical guidelines in medical research had been made (Helsinki rules). In the US, the ethical systems are based on the Belmont Reports. Physicians, scientists, philosophers came together to discuss about how one should treat participants. Three ethical principles for decisions were created: respect for participants, beneficence and justice.

Respect means that the participants should be treated as autonomous agents. The participants should decide whether to be part of the study or not. Beforehand, the participants need to know what the study is about and what the risks of the study are. Only then can someone decide to participate. Researchers are not allowed to influence a participant into participating. Some groups need to be protected in deciding to participate. These groups are children, people with intellectual or developmental disabilities and prisoners.

The principle of beneficence means that researchers beforehand need to know whether participants are at risk during the research and what the benefits of the research could be. Researchers must also look at the risks and benefits of populations. This needs to be done before a study is allowed to start. Researchers are not allowed to withhold new treatments from participants. Sometimes it is hard to estimate how much psychological or emotional pain a study might cause a participant.

The principle of justice asks for a balance between people that participate in the study and people who receive benefits from the study. The researchers from the Tuskegee syphilis study have made an ethical error, because they had only researched black Americans and not white Americans. They have disadvantaged one group. When only one ethical group is studied, the researcher needs to show that the problem that is studied is only present or mostly present in this group.

What are the guidelines for psychologists?

What are the five general principles?

The guidelines from the Belmont Report are not the only guidelines the American psychologists can follow. American psychological Association (APA) guidelines concern the roles psychologists can have: research scientist, educator or practitioner (therapist). There are five general APA-principles: respect, beneficence, justice, integrity and loyalty and responsibility (these are seen as one principle). The first three are the same as in the Belmont Report. Integrity means that educators need to teach their students accurate things and that therapists need to stay up to date with empirical evidence for therapeutic techniques. Loyalty and responsibility means that psychologists are not allowed to have sexual relations with their students or clients and that teachers are not allowed to have their student as a client.

What are the ten specific standards?

The APA also has ten specific standards that can be seen as rules. Psychologists that do not comply to these rules, can lose their practitioners' licence. Ethical Standard 8 is the most important standard for researchers. The other standards are more important for educators and therapists. In the following text, more about Ethical Standard 8 will be explained.

Standard 8.01 states that there should be an institutional review board, IRB. The board needs to decide whether research is conducted in an ethical way or not. Before a researcher is allowed to conduct a study with participants, he or she needs to fill in an application and send it to the board. In this application, the researcher needs to describe in detail how he wants to conduct they study and what the risks and benefits of the study will be. The members of the board decide whether a study is allowed to be conducted or not. Standard 8.02 states that many studies need to ask for an informed consent. That is a paper or website on which the study objectives, benefits and risks are stated. It also states whether the data will be treated anonymously and if the participants agree to take part in the study. For natural observation studies in low-risk settings, no informed consent is needed. The IRB decides whether a study needs an informed consent or not.

Standard 8.07 is about deception. Sometimes researchers do not tell the participant everything about the study or lie. According to some researchers, it is sometimes useful to withhold information from the participants. Other researchers think that one should never lie to participants. When researchers decide to withhold information from the participants, they have to debrief them after the study and tell share the actual goals of the study. This is part of Standard 8.08. Usually, participants are also debriefed in studies that have not used deception.

What do the standards tell us about violations in publications?

Most guidelines are about the proper treatment of participants. There are also guidelines about the publication process. It is seen as ethical to publish the results, because the participants made time to take part in the study. Two violations in publications are data fabrication and data falsification (Standard 8.10). Data fabrication means that a researcher does not enter the result that has been filled in by the participant, but he enters a value that supports his hypothesis. Data falsification means that a researcher influences the results by leaving some results out or by influencing the participants. Fabricating or falsifying data results in agreeing with theories that are actually not correct. It can also result in other researchers spending fruitless time in studying things based on incorrect results and ideas.

Another form of violation is plagiarism (Standard 8.11). That means that the ideas and words of another researcher are represented as one’s own ideas, without properly referencing to the original author. It can be seen as a form of stealing. In order to prevent plagiarism, an author needs to refer to the original author when he/she uses the ideas of the original author. This needs to be done according to APA norms that have been described in the first chapter (the last name of the author and the year of publication). Students need to think about these rules when they write thesis, because they can be punished severely (expelled) when plagiarism has taken place.

What do the standards say about animal research?

Psychologists use both human and animal participants. According to Standard 8.09, psychologists who use animals need to take good care of these animals, treat them humanely, use as less animals as possible and they need to be sure that the research is important enough. Every country has its own additional institutions that look at how animals in research are treated. In many countries, people look at the three R’s: replacement, refinement and reduction. Replacement means that researchers need to find a replacement for animals when that is possible. Refinement means that researchers need to conduct their study in such a way that is least stressful to animals. Reduction means a researcher must conduct a study with as few animals as possible.

Most psychologists condone the use of animals in research. However, they do not want the animals to experience pain. One type of activists think that animals have rights and that using them as test subjects is against their rights. Other activists think that humans are not more important than animals and that animals are only allowed to be used as test subjects if the same study can be conducted with human beings. Many psychologists treat the animals well and they have discovered many things that contribute to our basic and applied knowledge with the use of animal participants. Many psychologists try not to use animals and to find different procedures.

Nowadays, psychologists need to consider ethical guidelines when conducting studies with human or animal participants. Back in the days, psychologists used to have other ideas about their ethical interactions with participants.

What are good measures in psychology? - Chapter 5

How can variables be measured?

When psychologists have decided how to operationalize a variable, they must choose between three different types of measures: observational, self-reports and physiological measures. They must also decide what scale to use. A conceptual variable is the definition of a variable on a theoretical level according to the researcher. The operational variable is the decision about how the variable needs to be manipulated or measured. Every conceptual variable can be operationalized in different ways. The concept ‘wealth’ can be operationalized by looking at the yearly income of a person or by coding the age of the car of that person.

What are the three types of measures?

The three types of measures psychologists use to operationalize concepts fall in the categories of self-report measures, observations and physiological measures. Self-report measures look at the answers people give during an interview or on a questionnaire. The self-report measures in children need to be replaced by reports of parents and teachers. Observational measures are also called behavioural measures and they operationalize a variable by determining observable behaviours. Coding how much a car costs is an observable measure for wealth. Counting how many teeth prints are present in a pencil is an observable measure for stress. Physiological measures operationalize a variable by looking at biological data, like brain activity and heartbeat. This is often done with instruments, like EEGs and fMRIs. It is best to use all three report techniques to see whether the results are the same for all techniques.

Which scales can be used?

All variables need to have at least two levels. The levels of operational variables can be coded by using different scales. Operational variables are mainly classified as categorical or quantitative. The levels of categorical variables are categories. These variables are also called nominal variables. An example of this is gender, which has the level of male and female. A man can be coded as ‘1’ and female as ‘2’. These numbers do not say anything and one could easily use other numbers. These numbers do not have a numerical value and this means that being a women is not ‘higher’ than being a man. Quantitative variables do have a meaningful value.

Quantitative variables can be classified on ordinal, interval and ration scales. An ordinal scale looks at the order. A teacher can hand test back in order of highest grade to lowest grade. The first people have a higher score than the last pupil, but it is not known what the difference between the first and last pupil is. An ordinal scale does not say anything about the difference between the different tests. An interval scale does work with equal intervals (distances) between levels and it has a zero-point, but this point does not mean that there is nothing. An IQ test is an example of interval scale. The difference between 95 and 100 is as big as the difference between 105 and 110. Receiving zero on IQ test does not mean you have no IQ. A ratio scale has equal intervals and a true zero point that actually means ‘nothing’ or ‘zero.’ People who do not answer anything correctly on a test, will get a score of 0 and this 0 means they have not answered anything correctly. A true zero point makes it possible to say more about the levels. For example, one can state that a person who earns 4000 euros a month earns twice as much as someone who earns 2000 euros per month.

What is reliability in tests and how can it be measured?

How do you know if you have operationalized a variable correctly? How do you know if the measures of a study have construct validity? Construct validity has two aspects: reliability refers to how consistent the results of a measure are and validity looks as to whether a variable measures what it is supposed to measure.

What is reliability in tests?

Researchers collect data to be sure that their measures are reliable. Determining reliability is an empirical question. Reliability can be measured in three ways and all ways are about the consistency in measures. Test-retest reliability means that every time a researcher measures something, he/she finds the same score. People who have the highest scores on IQ test should also have the highest scores of the group on the IQ test a month later. Interrator reliability means that the same score is obtained with different rators. This form of reliability is most important in observational measures. Internal reliability means that a participant gives a consistent pattern of answers.

What can be used to evaluate reliability?

There are two statistical instruments that can be used to analyze the reliability: scatterplots and correlation. Reliability can be seen as association claim. Test-retest reliability can be depicted in a scatterplot. On the x-axis you can put the first measure of all the participants and on the y-axis you can put the second measure of all the participants. When the dots are the same, there is test-retest reliability. Interrator reliability can also be measured with a scatterplot. The values one rator has given to the participants are on one axis and the values the other rator has given are on the other axis. When the dots are close to the straight line, then interrator reliability is present.

The reliability of the relations between variables is usually assessed with a correlational coefficient, r. It states something about the direction and strength of a relationship. When the slope in a scatterplot goes down, r is negative, when the slope goes up, r is positive. The value of r is between -1.0 and 1.0. When the value is close to -1 or 1, the relationship is strong, when the value is close to 0 the relationship is weak. In order to determine test-retest reliability, one must look at two time measures. When the r between these two measures is positive and strong (higher than .50), test-retest reliability is present. When one looks at the scores of two rators and determines that r is positive and strong (.70 or higher), interrator reliability is present. In order to determine the internal reliability of a scale, researchers look at Cronbach’s alpha. You can do this with SPSS and all items of a scale are compared to each other. When the internal reliability is sufficient, all items can be included in the scale, when the reliability is not sufficient, researchers need to adjust the scale.

Which validities are present in measures?

One should not only look at the reliability, but also if the test measures what it is supposed to measure. Does the religion scale really measures how religious someone is? Psychologists often want to measure abstract constructs, for which no comparison standards are present. Construct validity is therefore important in psychological research. We can not measure happiness directly. We can estimate happiness by looking at different things. We can look at someone’s well-being, how often someone smiles, stress levels of hormones and blood pressure. All these measures are indirect measures. For some abstract constructs, there are no direct measures. How can you determine whether an indirect operational measure of a construct really measures what it is supposed to measure? You can find out by collecting data and evaluating the validity with this data. There are different types of validity one can look at.

Face validity means that a variable looks plausible. It is quite subjective: if it looks like a good measure, then it has face validity. Component validity looks at whether a measure contains all components of a construct. If intelligence can be seen as the ability to plan, solve problems, understand complex ideas, think abstractly, learn quickly and reason, then an operational scale needs to ask questions about all these components.

Most psychologists do not rely on only subjective forms of validity. They also look if the measure is associated with something it needs to be associated with. Criterion validity looks at whether a measure is related to concrete outcome, like a behaviour with which it should be associated according to the theory. When IQ test has criterion validity, it needs to correlate with behaviours that are related to the construct of intelligence (like the things mentioned above). Criterion validity can be assessed with the help of scatterplots and correlation coefficients. Another way to get information about criterion validity is to use known-group paradigms. Researchers look whether the scores of a measure can discriminate between a set of groups of which the behaviours are well-understood.

Another form of validity looks at whether there are meaningful patterns of similarities and differences. When there is a validity, a measure needs to correlate with another measure of the same construct (convergent validity) and it should correlate less strongly with measures of different constructs (discriminant validity). When you want to make a new scale for depression, you should see whether your scale shows similarities with already known scale for depression. When the correlation between these scales is high, you are allowed to say that there is convergent validity. Your scale should not correlate strongly with measures of other constructs (discriminant validity). For instance, it should not correlate strongly with perceived physical health. Convergent and discriminant validity are often determined together. There are no rules about how high or low the correlations need to be. The only rule is that the correlation between related constructs needs to be higher than the correlation between non-related constructs.

When psychologists have decided how to operationalize a variable, they must choose between three different types of measures: observational, self-reports and physiological measures. They must also decide what scale to use. A conceptual variable is the definition of a variable on a theoretical level according to the researcher. The operational variable is the decision about how the variable needs to be manipulated or measured. Every conceptual variable can be operationalized in different ways. The concept ‘wealth’ can be operationalized by looking at the yearly income of a person or by coding the age of the car of that person.

How do we use surveys and observations? - Chapter 6

In what way could the construct validity of a survey be improved?

Survey refers to questions that can be asked to people by phone, during interviews, on paper, via email or on the internet. Psychologists that construct their questions properly, can support frequency claims with good construct validity. Survey questions can have different forms. There are open-ended questions that present the participant with the possibility to answer however they like. These answers are usually rich in content, but a disadvantage is that the answers need to be coded and categorized. This costs time and it is difficult. That is one of the reasons why psychologists decide to use different types of questions. Usually, forced-choice questions are used. Subjects can choose the best from more than two options. Psychological research usually uses Likert-scales. Participants are asked about how much they agree with a statement. They can choose out of a couple of options (usually 5), from strongly disagree to strongly agree. When researchers do not look at how strongly someone agrees but look at another numerical value, this is called a semantic differentiation format. For example, 1 can represent easy and 5 can represent difficult. A famous example of the general population is rating products on internet with five stars. Researchers can combine different types of questions in one questionnaire. It is important to remember that different types of questions do not break the construct validity.

What is the best way to formulate questions?

The way in which questions are formulated and asked can have an influence on construct validity. Every question needs to be clear in order to be answered directly. People who make questionnaires need to make sure that the formulation and order of questions does not influence the participants. See the next difference of a research on racial relations in the US:

  • Do you think that the relationship between Blacks and Whites
    • will always be problematic?
    • or that a solution will eventually be found?
  • Do you think that the relationship between Blacks and Whites
    • is as good as it will get?
    • or eventually get better?

Only 45% of the people that have been asked the first set of questions were optimistic about racial relationships. 73% of the people who have been asked the second set of questions were optimistic about racial relationships (directive questions). The questions have been formulated differently: the first question was formulated negatively, with the words ‘problematic’ and ‘finding a solution’ and the second question was formulated positively, with the words ‘good’ and ‘better.’ People who make questionnaires need to formulate the questions as neutral as possible, otherwise they will not find the real thoughts and opinions of their respondents.

Sometimes a question can be formulated difficulty, that a respondent has trouble giving an answer that reflects his/her opinion accurately. It is always best to ask the question as simple as possible. When people understand the question, they can give a clear and direct answer. However, sometimes researchers forget this rule and they can put two questions into one item. These are called double-barrelled questions. These questions have a weak construct validity, because people can give an answer to the first question, the second question or both questions. Your item can measure the first construct, the second construct or both constructs. The questions should be asked separately.

Sometimes, the negative wording of a question can make the question difficult. With the term negative we do not mean negative words, like ‘bad’ or ‘problematic’, but denial words. One study showed that 20% of the Americans denied the Holocaust happened. This caused quite some stir and researchers wanted to know whether the study had been conducted properly. They discovered that the question was difficult: ‘Does it seem possible or impossible to you that the Nazi extinction of the Jews had never taken place?’ Most people find the double denial (‘impossible’ and ‘never’) difficult. This question did not measure the beliefs of people, but the extent to which they used their working memory and motivation to answer the question. This question had a bad construct validity and did not measure the true beliefs of people. Sometimes one negative word can make a question difficult. Often, researchers ask that same question in a positive way and the internal consistency of the two answers is assessed (if you agree with a positively stated question, you should not agree with the negatively stated question). One should be aware of negatively stated questions, because they can reduce the construct validity. Sometimes the answers to these questions state more about the motivation and ability to conduct cognitive work than about the true opinions of people.

The sequence of questions can also influence the answers people give. Say, for instance, that some people support actions that enhance the circumstances for women, but they do not support actions that enhance the circumstances for ethnical minorities. When they are first asked whether they support actions for women and then whether they support actions for ethnical minorities, the answers could have been different than when the sequence of the questions had been reversed. People want to be consistent and when they are asked firstly if they support actions for women and they answer positively, they are more likely to state that they also support actions for minorities. The best way to control for the influence of sequence, a researcher can make different versions of the questionnaire, all with another sequence, and see whether the results show an effect of sequence.

How can participants be encouraged to answer accurately?

Participants can give answers that are less accurate. They do not always do this on purpose. Sometimes they are not trying hard enough to give accurate responses or want to be seen as good or are not able to give accurate answers on questions about their motives and behaviours. However, self-reports are usually ideal. Most people are able to answer questions about their demographics and perspectives. Sometimes self-reports are the only options. When you want to know what someone dreams about or his level of anxiety, you have to ask the person.

Response sets are quick responses a person can give when answering a questionnaire. Sometimes people do not think about some questions and answer all questions as negative, positive or neutral. Response sets can weaken the construct validity, because people do not say what they actually think. One form of a response set is acquiescence. This means that someone keeps saying ‘yes’ or ‘strongly agree’ on questions. One way to check whether someone keeps saying yes to answers without meaning it or whether this person truly agrees with the questions, is to reverse the questions. The question ‘I love candy’ should be reversed into ‘I do not love candy.’ Someone who truly loves candy will agree with the first question and disagree with the second. Another response set is fence sitting. That means that people tend to choose the middle response of the scale. This is usually done when a question is controversial or difficult. One way to reduce this is to delete the middle answer option. One disadvantage is that people who really do not have an opinion about something or who are neutral do not have a way to express themselves.

Most people want to be seen as good human beings by others and sometimes a question gives options that can make a person look better than he/she is. These questions have a low construct validity, because people are more likely to choose for the answers that seem better to them. Sometimes participants are ashamed or worried to give an unpopular answer. One way to reduce this is to ensure participant anonymity or to ask friends and relatives. These people know the person quite well. Sometimes computer programmes are used to measure implicit opinions. In that way, participants usually do not know what the real subject of the study is and their answers will not be influenced.

Sometimes self-reports can be inaccurate, because people do not know why they think something or behave in a certain way. In fact, their memories about certain events can be inaccurate. Asking people what happened in the past is usually not the best way to find out what really happened. Self-reports are not appropriate for every question. A survey is appropriate to ask questions that are subjective: what a person thinks that he/she does and what he/she thinks that influences his/her behaviour. But if you want to know what people really do or what really influences their behaviour, then you need to observe these people.

What is construct validity of behavioural observations?

When a researcher observes the behaviour of animals or humans and notes it systematically, then he/she is conducting observational research. Some researchers think that observations are better than self-reports, because some people are not able to accurately answer questions about their behaviour and events from the past. Observations can form a basis for frequency claims. For instance, one can look at how often someone eats at a fast-food restaurant in a week, how often parents scream negative things during a match of their children and how often cars stop at a pedestrian crossing. One example of an observational research is the research of Mehl on how many words people say in a day. Every participant carried an electronic device that recorded what they said during the day. Researchers coded how many words a participant said during a day and the results showed that women said more words than men during a day. However, this difference was not statistically significant and women do not talk more than men.

Why are observations sometimes better than self-reports?

If the participants from the previous example had been asked to count the number of words they say during the day, many would have done this incorrectly. Researchers are really careful with observations, because they want the observations to be accurate and valid. Observations have a good construct validity when they can avoid the following three problems: observer bias, observer effects and reactivity.

Observer bias occurs when the expectations of an observer influence their interpretations about the behaviour of participants. They then do not judge the observations objectively, but according to their own expectations. Observer effects occur when the observer changes the behaviours of the person or animal he/she is observing. The behaviour changes and coincides with the expectations of the observer. In one research, students received one rat and they had to time how fast their rat learned to walk a maze. The rats were genetically the same, but some researchers told a couple of students that their rats were smart maze runners and they told other students that their rats were lazy maze runners. It turned out that the smart maze rats enhance their time every day, while the lazy rats did not. Observers did not just see what they wanted to see, but they caused the behaviour of the observed to coincide with the expectations. To prevent observer bias and observer effects, you can use codebooks, in which it is explained how to code certain behaviours. Another way is to use a blind design where the observer does not know the condition in which a participant is and he/she can not influence the behaviour of the participant.

Reactivity means that people change their behaviour when a person is looking at them. Reactivity occurs with both human participants and animal participants. One way to prevent this is to blend in. You can use a one-way mirror to study the behaviour of participants, without them seeing you. Another way is to let the participants get familiar with you being around. Another way is to look at the traceable data a behaviour leaves behind, instead of the behaviour itself. Someone can state that he/she is a careful driver, but his/her tickets can show otherwise.

Most psychologists think it is ethical to observe behaviours in public settings. When secret recordings are made during a research, a researcher needs to have good reason in doing so and tell the participant afterwards that a recording has been made and for which purpose. If the participant does not agree with the use of the recording, a researcher needs to delete that footage without looking at it.

Survey refers to questions that can be asked to people by phone, during interviews, on paper, via email or on the internet. Psychologists that construct their questions properly, can support frequency claims with good construct validity. Survey questions can have different forms. There are open-ended questions that present the participant with the possibility to answer however they like. These answers are usually rich in content, but a disadvantage is that the answers need to be coded and categorized. This costs time and it is difficult. That is one of the reasons why psychologists decide to use different types of questions. Usually, forced-choice questions are used. Subjects can choose the best from more than two options. Psychological research usually uses Likert-scales. Participants are asked about how much they agree with a statement. They can choose out of a couple of options (usually 5), from strongly disagree to strongly agree. When researchers do not look at how strongly someone agrees but look at another numerical value, this is called a semantic differentiation format. For example, 1 can represent easy and 5 can represent difficult. A famous example of the general population is rating products on internet with five stars. Researchers can combine different types of questions in one questionnaire. It is important to remember that different types of questions do not break the construct validity.

How to estimate the frequencies of behaviours and attitudes? - Chapter 7

What is generalizability?

When you test external validity, you wonder whether the results of a certain research can be generalized to a bigger population. The external validity is really important for frequency claims. You wonder whether the found results for your participants can be found in the entire population. Does your sample represent the entire population of interest? External validity does not just look at a sample size, but also at a setting. A researcher might not want to know whether the results of a study can be generalized to the other people of the population, but he might want to know whether the results can be generalized to other settings, like other products of the same company or other courses from the same teacher.

What are samples?

A population can be seen as a whole set of people or products a researcher is interested in. A sample is a smaller set from that population. When you want to know how a new flavour of Lays crisps tastes, you only have to grab one crisp from the bag. All other crisps from that bag will taste the same and you do not have to eat the whole bag in order to know how the crisps taste. If you would taste all the crisps from the bag, you would be performing a consensus. Researchers do not have to study an entire population. They believe that a sample says something about the entire population. The external validity of a research is about the adequacy of the sample to represent a non-researched population.

There are many populations researchers can study. Before researchers can decide whether a sample is biased or not, they have to specify a population. This is called the population of interest. Researcher can have a wide interest (like the entire population of the Netherlands) or a specific interest (all women who studied Psychology in Groningen). Only when you have a particular population in mind, you are able to speak about the generalizability of a sample. A sample can only represent a population if the sample comes from the population. However, this does not mean that a sample from the population represents the entire population. When a sample consists of Dutch people, it does not automatically mean that it represents the entire Dutch population. Perhaps the researcher has only looked at rich Dutch people. A sample can also be representative or biased. In a biased sample some members of the population of interest have a higher chance to be selected for the sample than other members of the population. In a representative sample, all members of the population have an equal chance to be selected for the sample. Only representative samples enable us to draw conclusions about the population.

When is a sample biased?

A sample might sometimes contain too many unusual members of a population. A sample can be biased in least two ways. Scientists might only research people they can easily come into contact with or only people who are excited to participate in a research. This can lower the external validity of a research, because people who are easy-going may have other attitudes than people who are not so easy-going. Many studies use so-called convenience sampling. That is a sample of people who are available to use. Usually, those are psychology students. Researchers can also use a convenience sample if they are not able to come into contact with a certain subgroup. Sometimes researchers are just not able to research people who live too far away, who do not come to their research appointment or do not answer their phone. This can result in a biased sample. A sample can also be biased through self-selection. This means that a sample contains people who themselves wanted to participate in the research. Self-selection is common in online research and present in internet polls. Internet users judge the products they have used and it is usually the case that the people who make those judgments are not representative of the entire population of people who own that product.

What are sample techniques?

If researchers really want to have a representative sample, they could best use a probability sample. Probability sampling is also called random sampling. This means that every member of the population of interest has an equal chance to be chosen for the sample. Because all members of a population have an equal chance to be chosen, the results of the samples can be generalized to the entire population. Random sampling is good for external validity. Nonprobability sampling involves non-random sampling, which can also result in a biased sample.

The basic form of random sampling is simple random sampling. You can image it as follows: every name of every member of a population of interest is written down on a paper and put into a hat. Then you take out a x-amount of names from the hat. Another way of simple random sampling is to give each person a number and to use a random table to select numbers that are allowed to participate in the research. Simple random sampling may take a lot of time, because it is difficult to assign a number to every member of a population. In a cluster sample, clusters of participants from a population are randomly selected and then all individuals in all selected clusters are used. A multistage sampling looks like this, but two random samples are conducted: first, a random cluster sample is chosen and then a random sample of people within this cluster is taken.

Another technique is stratified random sampling. A researcher selects certain demographic groups and then he conducts a random selection of individuals within each of these groups. For instance, researchers want their sample of 2000 Canadians to contain South-Asians in the same proportion as the entire Canadian population. 4% of the Canadian population is South-Asian and at least 80 South-Asians from Canada need to be in the sample. There are two strata in this research: the South-Asian Canadians and the other Canadians. All members are selected randomly. Another variation of stratified random sampling is oversampling. This means that the researcher intentionally over-represents one or more groups. A researcher can decide to do that when the subgroup is only a small percentage of the entire group (like the 4% of South-Asian Canadians). The researcher can decide to use 200 South-Asian Canadians in the research instead of 80. The South-Asian Canadians are 10% of the sample, while they are only 4% of the entire population. With an oversampling, the results are adjusted and the over-represented group is weighed proportionally in the population. Oversampling is done in a random way.

In systematic sampling, a computer or random table is used and the researcher selects two random numbers, like 3 and 6. When the sample consists of a gym full of athletes, the researcher starts with the third person and selects every sixth person onward, until the sample is big enough. Often, researchers use different sampling techniques in a research. As long as it is done in a random way, the sample will represent the population. However, be aware that random sampling is not the same as random assignment. Random assignment is done in experimental designs. Researchers want to put participants in different groups (conditions) and they want to do that in a random way. Random assignment enhances the internal validity by ensuring that the treatment group and the comparison group have the same type of people.

Can researchers choose biased sampling techniques?

When external validity is not important for a researcher, he can decide to choose a biased sample. Convenience sampling means that a researcher uses people who are easily accessible. When researchers only want to ask people from a certain subgroup and these people are not chosen in a random way, they are using purposive sampling. Another form of purposive sampling is snowball sampling. The participant is asked to ask a couple of relatives or acquaintances he/she knows to participate in the research. Of course, this is not a representative way to sample people, because people ask others from their social networks and that is not a random way. In quota sampling, a researcher identifies a subpopulation and he chooses how big every subpopulation in the sample will be. Then, he chooses the people from this population in a non-random way.

What is most important in external validity?

Frequency claims are claims about how often something occurs in a population. This is often expressed in percentages. External validity is really important for frequency claims and one must therefore look at sampling techniques. Sometimes the external validity of sample that are based on random samples can be confirmed. Sometimes the polls that have been conducted before elections are quite the same as the outcomes of the elections. However, it is often difficult to check the accuracy of samples, because researchers can not study an entire population to see whether the actual percentage is the same as the sample percentage. The only thing they can do is to look if their sampling techniques are appropriate. As long as a random sample is used, you can be more sure about the external validity of your results.

What if a representative sample is not that important?

External validity is often important for frequency claims, but external validity is not always a major priority for researchers. That can be the case for researchers who want to research association and causal claims. Many association or causal claims can be detected accurately with a convenience sample. In frequency claims, you have to wonder whether it is important for a sample to have been randomly conducted. Is the reason for a biased sample relevant for your claim? Are the characteristics of the population that make a sample biased relevant for what you are measuring? If they are not important, you can sometimes trust non-representative samples.

Are bigger samples always better?

One of the biggest myths in research is that bigger samples are always better. When a phenomenon is rare, you do not need a big sample to analyze it. Usually, researchers can use 1000 people when they want to study a big population as the United States. The bigger the sample, the smaller the margin of error. After a sample of 1000 people, you need to have many more people to make the margin of error only slightly more accurate (with 1500 the margin of error is also 3% and with 2000 people it is 2%). 1000 is therefore seen as an optimal balance between effort and accuracy. A sample of 1000 people enables one to generalize the results to the entire population, as long as the sample has been chosen randomly. Sample size is not an issue of external validity, but of statistical validity.

When you test external validity, you wonder whether the results of a certain research can be generalized to a bigger population. The external validity is really important for frequency claims. You wonder whether the found results for your participants can be found in the entire population. Does your sample represent the entire population of interest? External validity does not just look at a sample size, but also at a setting. A researcher might not want to know whether the results of a study can be generalized to the other people of the population, but he might want to know whether the results can be generalized to other settings, like other products of the same company or other courses from the same teacher.

Why not do some bivariate correlation research? - Chapter 8

Association claims are claims that describe the relationship between two measured variables. A bivariate correlation is also called a bivariate association between two variables. In order to study an association, one has to first study one variable and then the next variable. This has to be done for the same group of people. Then, statistical methods and graphs are used to depict the type of relationship between the variables. Relatively many studies are correlational. One example of correlational research is the study of John Cacioppo on online love and marriage satisfaction. Cacioppo and his colleagues were interested in the relationship between meeting a spouse online and marriage satisfaction. They mailed an online questionnaire to thousands of people who used uSamp (an online research centre). Participants answered questions about where they met their spouse (whether online or offline). The researchers also measured the marriage satisfaction with the Couple Satisfaction Index (CSI). One question from this index was ‘Indicate the degree of happiness of your marriage.’ Participants could give their answer on a seven point Likert-scale (which ranged from ‘extremely unhappy’ to ‘perfect’). The research showed that people who had met each other online, scored higher on the CSI. Of course, a correlational association does not show a causal relation and people should be cautious with drawing conclusions from this research.

How can the association between two variables be described?

After having collected the data, you have to describe the relationship between the two measured variables with scatterplots and correlation coefficient r. When you want to depict the variables, you have to put the score of every participant in a scatterplot and then draw a line through these dots. When the line runs from the lower left corner to the right upper corner, we can say that there is a positive relationship. A positive relationship means that a high score on one variable goes together with a high score on the other variable. When the line goes from the upper left corner to the lower right corner, we can speak about a negative relationship. High scores on one variable go together with low scores on the other variable. The strength of the correlation can be expressed with the correlation coefficient r. This is a number between -1 and 1. A correlation of .10 or -.10 has a weak effect size. An r of .30 or -.30 has an average effect size. A correlation of .50 or -.50 or higher has a large effect size. R shows the direction (positive or negative) and the strength of the relation.

How can associations with categorical data be ?

In the example about the study of Cacioppo, one of the variables was a categorical variable. That was the variable about meeting your spouse online or offline. People are only able to answer ‘online’ or ‘offline’ to that question. The values of a categorical variable can only be ascribed into one category. The other variable - marriage satisfaction was quantitative. People could choose from seven different possible answers. When both variables of an association are measured with quantitative scales, it is possible to make a scatterplot. The data is best represented in that way. A scatterplot is not the best way to represent data that has a categorical variable. The dots that represent the participants, will be underneath each other (vertical) for meeting a spouse online and also vertical for meeting the spouse offline. In a scatterplot with a categorical variable it is difficult to see whether the relationship is positive or negative. It is better to use a bar graph. In a bar chart, every individual is not represented as a dot, but the averages of each category are depicted. With a bar graph, one is able to look at the differences between the group means (category means).

When at least one of the variables in an association claim is categorical, different statistical measures can be used to analyze the data. Sometimes r can be used, but it is more common to test whether the differences between the means are statistically significant. This is often done with the t-test. It might seem odd that association claims can be depicted with both scatterplots and bar graphs and that they can be described by different statistical methods. It does not matter what kind of graph of statistical measure you use: if both variables are measured, a study is correlational. Experiments are better for causal claims. An association claim is not supported by a certain graph or a certain statistical measure; it is supported by the design of the study with measured variables.

How can association claims be researched?

The most important validities that have to be researched in association claims are construct validity and statistical validity. Sometimes, the external validity can also be studied. Internal validity is not important in association claims.

What is construct validity in association claims?

An association claim describes the relationship between two measured variables and it is therefore important to look at the construct validity of both variables. That means that you have to look at how well each variable was measured. One thing you can ask yourself is whether the measure was reliable and whether it measures what it is supposed to measure. You might also ask yourself what the reliability of the discriminant validity, face validity, convergent and concurrent validity of the variable is.

What does statistical validity in association claims mean?

When you look at the statistical validity of an association claim, you want to know if and which factors have an influence on the scatterplot, the correlation coefficient r, bar graphs or differences of means that have led to the association claim. One has to look at the effect size, outliers in the data, restrictions and statistical significance of the relationship.

What does effect size mean in association claims?

The effect size looks at the strength of a relationship. That is because some associations are stronger than other associations. When there are two associations, the association with the r closest to 1 is stronger. Stronger effect sizes go together with more accurate predictions. The prediction error decreases when the effect size increases. Stronger effect sizes are usually also more important than small effect sizes. Of course, there are exceptions to this rule. Sometimes, even a small effect size can be of importance. When it comes to matters of life and death, even a small effect size can be important. In one study on heart attacks, one half of the participants received an aspirin every day, while the other received a placebo. It turned out that one aspirin a day was associated with less heart attacks, but the effect size of this study was r = .03. This result was taken seriously.

What does statistical significance mean for association claims?

It is often the case that the results of the sample and population mirror each other. However, sometimes there is no association between the two variables of a population, but a study might find an association in the sample. The correlation in that sample is caused by chance. We should therefore ask ourselves if there is a real association in the population or if the association is found by chance in the sample. Statistical significance measures show a probabilistic probability, p. The p says something about the chance that the association came from a population in which the association was zero. If the chance is smaller than .05, we can assume that it is unlikely that the results came from a zero association. The correlation is seen as statistically significant. When the results show a high p (.05 or higher), the results are not statistically significant. In that case, a researcher can not rule out that the results came from a population in which the association is zero. Significance is also related to effect size: the stronger the correlation (high effect size), the higher the chance that the correlation will be statistically significant. Statistical significance measures depend on effect size and sample size. A small effect size will be statistically significant if it was taken from a large sample (1000 or more participants). A small sample is easily influenced by chance. Weak correlations that are based on small samples are more likely to be the results of chance and they will be seen as non-significant. In scientific articles you can read about the significance of a study. You can find the significance by looking for a p, but sometimes a statistically significant results are represented by an asterix (that is a *).

Do outliers have an influence on the association of variables?

Outliers are extreme scores. They are scores that deviate from the other scores. Outliers may sometimes have a large effect on the correlation coefficient r. Outliers can cause problems for association claims. In bivariate correlations, outliers are problematic when they have extreme scores on both variables. When you study an association claim, you first have to find out if there are any outliers present in the data. You might find these outliers by looking at the scatterplot. Looking at outliers is especially important when you have a small sample. When a sample consists of 600 participants that all score in the middle, one outlier that scores extremely will not have a lot of influence. However, an outlier will have a big influence in a sample of 16 people who score all in the middle.

Are there range restrictions?

If in a correlation study the whole range of scores of a variable in an association is not present, the correlation might look smaller than it really is. This is called a range restriction. That means you have not shown all values that are available. When researchers suspect there is a range restriction, they can decide to use a statistical method, the correction for restriction of range. Restriction of range can be present if, for some reason, there is not much variation in one of the variables. When you want to look at the correlation between the income of parents and the academic achievement of a child, you have to look at all incomes. So, you should not only look at the middle-income parents, but also at the low-income and high-income parents.

Is the relationship curvilinear?

When a researcher states there is no relation between the variables, it might be true that there indeed is not an association between the variables. However, in some cases the relationship can be curvilinear. This means that the relation between the two variables can not be depicted as a straight line. It is possible that the relation is positive in the beginning, but at some point it turns negative. One example of this is health care. When people get older, they need less health care until a certain age. However, after a certain age, one needs more health care. So, there is a curvilinear relation between age and health care.

Can causal inference be made about association claims?

Correlation does not mean causality. People who have not studied psychology and who have read about Cacioppo’s study on marital satisfaction and online dating might wrongfully conclude that dating someone online results in happy marriages. They have, wrongfully, ascribed causality to a correlation. You have to remember that correlation is no causation! A plain association can not establish a causation. One needs temporal precedence, internal validity and covariance for causation. In a correlational designs with two variables, you do not always know which variable preceded the other and whether one variable has caused the other. You also do not know whether there is a third variable that has influenced one or both variables. Causality can only be studied with experiments. When a third variable causes a correlation between two variables, we say there is a spurious association.

To what extent can the association be generalized?

External validity looks at whether association claims can be generalized to other people, times and places. Sometimes a bivariate correlational study has not used a random sample, but that does not mean that there is no association. You can accept the results of the study and leave the part of generalization to future studies. Many associations do generalize to the population.

When the relationship between two variables in an association study changed because of a third variable, we speak of a moderator. Moderators give us information about the external validity. When an association is moderated by a third variable, some results might not be generalizable to other settings or groups of people.

Association claims are claims that describe the relationship between two measured variables. A bivariate correlation is also called a bivariate association between two variables. In order to study an association, one has to first study one variable and then the next variable. This has to be done for the same group of people. Then, statistical methods and graphs are used to depict the type of relationship between the variables. Relatively many studies are correlational. One example of correlational research is the study of John Cacioppo on online love and marriage satisfaction. Cacioppo and his colleagues were interested in the relationship between meeting a spouse online and marriage satisfaction. They mailed an online questionnaire to thousands of people who used uSamp (an online research centre). Participants answered questions about where they met their spouse (whether online or offline). The researchers also measured the marriage satisfaction with the Couple Satisfaction Index (CSI). One question from this index was ‘Indicate the degree of happiness of your marriage.’ Participants could give their answer on a seven point Likert-scale (which ranged from ‘extremely unhappy’ to ‘perfect’). The research showed that people who had met each other online, scored higher on the CSI. Of course, a correlational association does not show a causal relation and people should be cautious with drawing conclusions from this research.

What is multivariate correlational research? - Chapter 9

Association claims can give a lot of information. A popular example of an association is that children who see much violence on the television, become aggressive. However, that does not say anything about causality. You want to know if children really become aggressive by watching violent programmes, in order to come up with interventions. The best way to test causality, is by using an experiment. Techniques that go further than correlations will be discussed.

Longitudinal research and multiple regression designs are about more than two measured variables and they are called multivariate designs. These designs are not the answer to causality criteria, but they are used often and are an alternative of experiments. The example of watching violent programmes and aggressive behaviour is an example of bivariate correlational research. This does not comply to the three criteria of causality. In that type of research, one can show that there is covariance, because studies have shown that the correlation between watching violent programmes and aggressive behaviour is .35. However, with this design, it is not possible to show what came first: watching violent programmes and becoming aggressive or being aggressive and watching violent programmes. There is also no good internal validity, because the relation between watching violent programmes and aggressive behaviour could be explained by a third variable. With bivariate designs, it is difficult to say which variable came first and whether another variable influenced the relationship.

How can temporal precedence with longitudinal designs be shown?

Longitudinal designs can show temporal precedence by measuring the same variables in the same person in different times. Longitudinal designs are often used in developmental psychology to study the changes in certain characteristics of humans. Eron conducted in the 1960s and 1970s a study on watching violent programmes and aggressive behaviour. He asked young children at school to list their four favourite television programmes and also asked all the children which child from their class was the one who picked fights the most, who hit, kicked, punched and bullied the most. Ten years later he asked the same children the same questions. This research is longitudinal, it also measured four variables: watching aggressive programmes on time measure 1, watching aggressive programmes on time measure 2, aggression on time 1 and aggression on time 2.

How can results of longitudinal designs be interpreted?

More than two variables are part of a multivariate correlational design and the design will therefore give more than one correlation. These can be cross-sectional correlations, autocorrelations and cross-lag correlations. The first two correlations are cross-sectional correlations and they test whether two variables that have been measured on the same time point correlate. The Eron study showed that watching violent shows at a young age correlated with aggressiveness in young age. Then, the researcher looked if the same variables correlated with each other on different time points. These are called autocorrelations. He looked whether the preference for violent programmes at a young age correlates with a preference for violent programmes in the teenage years and whether aggressive behaviour in the younger years correlates with aggressive behaviour in teenage years. Researchers are most interested in cross-lag correlations and those are correlations that look whether the earlier measures of a variable are associated with later measures of another variable. In the research example, scientists wanted to know if watching violent shows at a young age was associated with aggressive behaviour in the teen years and if aggressiveness at a young age was associated with watching violent programmes in teen years. This cross-lag correlation shows how people change through time and it shows temporal precedence. Eron’s research showed only one significant cross-lag correlation and that correlation was that children who have a preference for violent programmes at a young age become more aggressive in the teenage years. Children who were aggressive at a young age, did not show a preference for violent programmes in their teen years. These results speculate that the preference to watch violent programmes came first.

What about the three criteria of causality in longitudinal studies?

Longitudinal research can help meet some criteria for causality. Correlations in research show there is covariance. Longitudinal studies show temporal precedence, because each variable is measured at at least two time points. Researchers are able to see what the different patterns are and they can decide whether variable x or y came first. However, longitudinal studies can not exclude third variables. When you only look at two variables on two time points, you can not say anything about a third variable that might influence the relation. However, researchers can arrange a longitudinal study in such a way, that they are able to exclude some third variables. In Eron’s research, boys and girls were studied separately. Eron tried to exclude a third variable - gender.

Some people will ask why researchers of longitudinal studies go to such lengths to find their original participants ten years later and why they do not just choose to conduct an experiment. The reason is that people can not always be assigned to a condition. Sometimes it is difficult to manipulate variables. Also, in some cases it might be unethical to ascribe a person to a certain condition.

In what way can multiple-regression designs exclude third variables?

One study showed that giving children long breaks was associated with less problematic behaviour. But what is the causal link? Do children behave better because they have had longer breaks or are they nice children rewarded with longer breaks? You also have to think about possible third variables. It is possible that there are one or two variables that have influenced the relationship between the length of the breaks and good behaviour. With multiple regression analysis, it is possible to exclude some third variables. Barros and her colleagues asked teachers of different schools to indicate how long a typical break is and also asked to fill in a questionnaire about problematic behaviour in children. Researchers also looked at how many children were in class, the income of the parents and whether the school is a private or public school. This is what makes a study a multivariate correlational study.

With multivariate designs, a researcher can see whether a relation between two variables stays intact when a third variable is controlled. You can split a third variable into different subgroups. Suppose you take income of parents as the third variable. You can split this in low income, middle income and high income. Then you can look if the relation between problematic behaviour and length of break stays intact in these subgroups.

What statistical measures are used in multiple regression designs?

First, a researcher needs to decide which variable is most interesting. This is called the dependent variable or criterion variable. In the study about breaks and problem behaviour, researchers were most interested in problem behaviour. The other variables are called independent variables or predictor variables. When you do a regression in SPSS, you will get a regression table. In your regression table, you have to look at beta values. Beta shows the direction and strength of the relation between predictor and criterion variable, while the other predictor variable is controlled. It looks like r, but it adds an extra dimension. A negative beta shows a negative relation and a positive beta shows a positive relation. A higher value means that the relation is stronger than that of a lower value. The beta is standardized and the measures of the different predictor variables are all standardized to one unit. A beta value can change when other predictor variables are added. Also, next to the column of the beta values, there are columns for p-value and the significance of beta. When p is equal or higher than .05, beta is significant. That means that the found association between the predictor variable and criterion variable came into existence by chance and probably does not exist in the population.

What if you look at multiple variables that may have an influence on the relation between criterion and predictor variables? The same rules apply for beta. The beta-value of a variable says something about the relation between predictor variable and criterion variable, controlling for other predictor variables in the model. It is better to add more predictor variables to a model. Another thing about betas is that you can see which factors have a stronger influence on the dependent variable. Look at the beta and do not confuse it for the unstandardized b. That is a value that is also shown in a regression table, but it looks at unstandardized values. You can not compare every variable with b. Terms like ‘controlling for other variables,’ ‘taking other variables into account,’ and ‘correction for other variables’, show that multiple regression is used in the research.

Can regression establish causality?

Multiple regression designs can exclude some third variables, but they can not establish temporal precedence. They can not also control for third variables that are not put in the study. Researchers might not know of a certain variable that can have an influence on the relationship between predictor and criterion variable. This variable will therefore not be included in the model and the results might be biased. The problem with third variables can only be solved by conducting experiments. By random assignment, you take potential third variables out of the equation. Only experiments can establish causality.

What do pattern and parsimony mean for causality?

Longitudinal studies establish temporal precedence. Multiple regression analyses establish temporal precedence. In correlational research, pattern and parsimony can be used to establish causality. Parsimony is the degree to which a proper scientific theory can give the most simple explanation for a phenomenon. In causal claims, parsimony refers to the most simple explanation for a data pattern. That is the best explanation and for which you make the least amount of exceptions or qualifications.

Decades ago people saw that more smokers than non-smokers had lung-cancer. Multiple regression analysis can exclude certain third variables, but can not exclude variables you have not put in the study. You are also not allowed to conduct experiments, because it is unethical to assign participants to a smoking condition. The only data researchers had, was data from the correlational studies. With the correlational data researchers had to come up with a simple mechanism. The most logical thing to say was that in the smoke of cigarettes chemicals are present that become poisonous when they come into contact with human tissue. The more a person comes into contact with these chemicals, the more he/she will be exposed to the poisonous chemicals. With this, researchers could make predictions and test things. Scientists often combine methods and results to develop and test causal theories. Journalists should not just present a part of the study, but they should also state what previous studies have been found and describe the context of the study.

What about mediation in multivariate regression?

When a relationship is established, scientists want to go further. They ask themselves why something is happening. These explanations for causal relationships often contain a mediator. When variable x has a direct influence on variable y, but also can go through variable z and indirectly influence variable y, we call variable z a mediator. A study does not have to be correlational to contain a mediator. In experimental research, mediators can be present. It is often the case that a mediator can be properly analyzed by multivariate methods. Mediators are similar to third variables. They can both be studied by multiple regression. However, they do differ on some points. A third variable is external to the two variables in the original bivariate relation and it is often seen as a disturbing variable. A mediator is internal to the causal relation and in research it is seen as an interesting variable. Do not confuse mediator and moderator with each other.

What about the four validities in multivariate designs?

Multiple regression analyses help with the third variable problem, longitudinal research establishes temporal precedence and multivariate designs have some evidence for internal validity. For multivariate designs, it is also important to check the construct validity by looking at how well each variable was measured. In order to measure the external validity, researchers can look at the participants. Are they chosen randomly? Are people from different groups of society used? Statistical validity can also be researched by looking at the statistical data that researchers have presented. What about the effect sizes and significance? They should also look at outliers and curvilinear relationships.

Association claims can give a lot of information. A popular example of an association is that children who see much violence on the television, become aggressive. However, that does not say anything about causality. You want to know if children really become aggressive by watching violent programmes, in order to come up with interventions. The best way to test causality, is by using an experiment. Techniques that go further than correlations will be discussed.

How can causal claims be evaluated with the help of experiments? - Chapter 10

What are the variables in an experiment?

Experiment means that a researcher manipulates at least two variables and measures another variable. Experiments can be conducted in a laboratory or anywhere else. A manipulated variable is a variable that can be controlled. A researcher can assign someone to a certain condition of the variable. Measured variables are registered measures of behaviours and attitudes, like self-reports, behavioural observations or physiological measures. During the experiment, researchers write down what is happening. In an experiment, the manipulated variable is an independent variable. The measured variable is the dependent variable. Researchers have less control over the dependent variable than the independent variable. They manipulate the independent variable and see what happens with the dependent variable. When the values are expressed in a graph, the independent variable is on the x-axis and the dependent variable is on the y-axis. When the researchers manipulate an independent variable, they have to make sure that only one thing varies at a time. Researchers also have to control for potential third variables by keeping the levels of the independent variables constant. Every variable a researcher intentionally keeps constant, is called a control variable. Actually, control variables are no variables, because they do not vary, the levels are kept constant. These control variables are essential in experiments. They enable a researcher to separate the cause from a potential other cause and in that way, they eliminate alternative explanations of the results. Control variables are important for the internal validity.

Why do experiments support causal claims?

Researchers can support causal claims with the help of experiments. The three rules for causality have been mentioned previously. Experiments comply with these three rules of causality.

What about the covariance and temporal precedence in experiments?

In experiments, there are comparison groups. Experiments are better sources of information than your own experience, because you can not compare your own experience to something else. Experiments manipulate an independent variable and every independent variable has two levels, so actual experiments always try to look at covariance. An independent variable can show covariance in different ways. A control group is a level of the independent variable that shows ‘no treatment’ or a neutral condition. When a researcher has a control group, the other conditions are called treatment groups. Temporal precedence can also be controlled in experiments. That is because researchers first manipulate an independent variable and then look at the dependent variable. An experiment first shows the cause, then the effect. Because of this, experiments are superior to correlational designs.

What about the internal validity of experiments?

Internal validity is important to causal claims. A study has good internal validity when it assures that the causal variable and no other factors are responsible for the changes of the outcome variable. Alternative explanations are called confounds and they form a threat to internal validity. There are different confounds for internal validity.

A design confound is a researcher’s mistake in the design of the independent variable. It is a second variable that varies at the same time with the independent variable of interest. It can therefore be seen as an alternative explanation of the results and that is not a good thing. With a design confound, an experiment has bad internal validity and it can therefore not support causal claims. You do have to be careful with stating that a study has a design confound. Not all potentially problematic variables are confounds. When there is systematic variability in the independent variable, a design confound can be problematic. Suppose there are two conditions and in both conditions participants have to solve anagrams. If one group contained participants who are really good in solving anagrams and the other group contained participants who are really bad in solving anagrams, then that is a confound.

A selection effect occurs in an experiment when the type of participants in one level of the independent variable is systematically different than participants in another level of the independent variable. Selection effects can occur when researchers allow the participants to choose in which group they want to be. Selection effects can also result in researchers assigning one type of people to one condition and another type of people to another condition.

Good experiments use random assignment to eliminate selection effects. In some studies, a researcher can toss a dice to determine to which condition a participant will be assigned. In that way, everyone has an equal chance to be assigned to a particular condition. Participant might differ in their motivation, intelligence and other things and random assignment allows for a more equal spread of participants in the conditions. The experimental groups will be quite similar.

However, random assignment does not always work perfectly. This is usually the case with small groups. Researchers can decide to use matched groups. In order to use matched groups, researchers need to measure a variable that might be important for the dependent variable. This can for instance be IQ. When you have four groups, you can look at the four participants with the highest IQ. From that matched group, one person is randomly assignment to one of the four groups. Then you can look at the next four participants with the highest score and you do the same things, and so on. Matching can help the random assigning and it creates equal groups. The disadvantage of a matched group is that you need to perform an extra step - in the case of this example, you need to first have the participants do an IQ test.

What are independent-groups designs?

In an independent-groups design, different groups of participants are assigned to different levels of the independent variable. This is also called a between-group design. In a within-groups design (also called a within-subjects design), there is just one group of participants and every person will be exposed to every level of the independent variable. Two forms of the independent-groups design are the posttest-only design and the pretest/posttest design. In the posttest-only design, participants are assigned randomly to groups of the independent variable and they are tested only once for the dependent variable. The posttest-only design meets all three criteria of causality. In a pretest/posttest design, participants are randomly assigned into a group and they are tested twice on the dependent variable: once before the exposure with the independent variable and once after the exposure with the independent variable. Researchers can use a pretest/posttest design if they want to evaluate whether random assignment has made the groups equal. This is especially done with small groups and this ensures researchers that there are no selection effects. A pretest/posttest design can also show how participants in the experimental condition have changed over time. A pretest/posttest design is handy, but it can not always be done. However, the posttest-only design is already a good way to conduct research.

What are within-groups designs?

There are two types of within-groups designs. In the concurrent-measures design, participants are exposed to all levels of an independent variable at the same time and one behaviour or preference is the dependent variable. In one study, researchers wanted to know if babies preferred to look at female faces or male faces. They had babies look at pictures of males and females at the same time. Researchers measured at which faces they looked the longest. The independent variable was the sex of the face and the babies were exposed to both levels of the independent variable at the same time. The preference of the babies was the dependent variable. In a repeated-measures design, participants are measured more than once on the dependent variable - so after exposure to each of the independent variables.

The advantage of a within-groups design is that it guarantees that the participants in both groups are equal, because they are the same participants. Every participant can be compared with himself/herself. A person is his/her own control person. With that design, researchers can state with more power that there is an effect between the conditions. Because all differences are kept constant, it is more likely that researchers will find an effect for the manipulation of the independent variable, if there actually is one. Power is the possibility of a study to show a statistical significant result when an independent variable truly has an effect in the population. A within-groups design is also seen as a way to do research, because you need less participants than in most other designs.

What about the three criteria of causality in within-groups designs?

Within-groups designs can sometimes be bad for the internal validity. Being exposed to a condition can change how a participant reacts to other conditions. These responses are called order effects. Order effects occur when the exposure to one level of the independent variable influences the responses of the next level of the independent variable. These order effects are confounds. Order effects can consist of practice effects. These effects are also called fatigue effects. A long sequence can result in someone getting better in a task or getting bored by the end of the task. Order effects can also contain carryover effects. A form of contamination can expel from one condition to the other. After having brushed your teeth, apple juice will taste different than normal.

In order to prevent order effects, researchers can use counterbalancing. This means that researchers present the different levels of the independent variable in different sequences to participants. When researchers want to use counterbalancing, they must divide participants in groups. Each group will get one of the sequences. With random assignment, one group is assigned one sequence and the other groups the other sequences. An experiment can be fully or partially counterbalanced. When a within-groups experiment has only two or three levels of an independent variable, researchers can use a full counterbalance. In this type, all possible sequences are used.

When the number of conditions increases, the number of possible sequences increases drastically. When researchers want a couple of people in a sequence, they need a lot of participants. Therefore, a full counterbalance is not always practical. In a partial counterbalance, only a few of the possible sequences are presented. The conditions can be randomly presented to the participants.

Within-groups designs can establish covariance, they enable temporal precedence and if order effects are controlled for, then the internal validity of the designs is good as well. Sometimes researchers do not choose for within-groups designs. One of the reasons is the order effects. Another disadvantage of these designs is it is not always practical. A third problem occurs when people see all different levels of an independent variable and change their behaviour according to that.

What do the four validities state about causal claims?

Construct validity says something about how well the variables have been measured and manipulated. When you look at the construct validity of the experiment, you must look at both the independent and dependent variables. Sometimes the researchers use a manipulation check to see whether the construct validity of their independent variable is good. Pilot studies can also be used to see whether the manipulations are effective. Pilot studies are studies that need a few participants and they are conducted before the actual study. Researchers can also show that the results support their theory by gathering additional data.

If you want to examine the external validity of causal claims, you have to look how participants have been chosen. If it is by random sampling, then the external validity is good. It is often the case that external validity is not a top priority for the researchers who conduct a study. Internal validity is more important for an experiment and if both types of validity can not be guaranteed, researchers choose internal validity over external validity.

With statistical validity of experiments, one must look at the effect size, d. This number will show how big is the difference between the groups on the dependent variable. It shows the distance between the means of a group and it shows how the scores overlap. It looks at the difference between scores and the spread within a group of scores. A higher d goes together with a higher r.

If the internal validity of an experiment is good, then you can be quite sure that your causal claim is accurate.

Experiment means that a researcher manipulates at least two variables and measures another variable. Experiments can be conducted in a laboratory or anywhere else. A manipulated variable is a variable that can be controlled. A researcher can assign someone to a certain condition of the variable. Measured variables are registered measures of behaviours and attitudes, like self-reports, behavioural observations or physiological measures. During the experiment, researchers write down what is happening. In an experiment, the manipulated variable is an independent variable. The measured variable is the dependent variable. Researchers have less control over the dependent variable than the independent variable. They manipulate the independent variable and see what happens with the dependent variable. When the values are expressed in a graph, the independent variable is on the x-axis and the dependent variable is on the y-axis. When the researchers manipulate an independent variable, they have to make sure that only one thing varies at a time. Researchers also have to control for potential third variables by keeping the levels of the independent variables constant. Every variable a researcher intentionally keeps constant, is called a control variable. Actually, control variables are no variables, because they do not vary, the levels are kept constant. These control variables are essential in experiments. They enable a researcher to separate the cause from a potential other cause and in that way, they eliminate alternative explanations of the results. Control variables are important for the internal validity.

Where and how can we determine the influence of confounding and obscure factors? - Chapter 11

The biggest threats for internal validity are design confounds, selection effects and order effects.

What are other threats for internal validity?

Maturation threat is a change in behaviour that has occurred spontaneously over time. People adapt to their environment, people become better in certain actions and children learn to talk properly. That just ‘happens’ and no intervention is necessary to cause this change. To take away the threat of maturation, control groups need to be used. Sometimes changes occur because something specific has happened between the pretest and the postttest. This is called a historical threat. In order to be seen as a historic threat, the variables need to have an influence on everyone or almost everyone of a group. Historical threats can be prevented with the use of control groups. Regression threats refer to regression to the mean. When a behaviour is extreme on time point 1, it will probably be less extreme on time point 2. Extremity is usually explained by fortunate or unfortunate random events. Regression threats only occur in a pretest/posttest design and when a group scores extremely on the pretest. These threats can be prevented by the use of control groups.

Attrition is a reduction of the participants that takes place before the research is finished. For instance, this can happen between the pretest and postttest. Attrition is a problem when it is systematic. That means that it causes a problem when a certain type of participants is not participating anymore. When these participants are not present, the results can be biased. Researchers can exclude the data of participants that fall off before the end of the study. A test threat refers to the change in a participant because he/she has done the test more than once. People can become better in making a test or they can get bored. Researchers can prevent this by using alternative forms of the two measures. An instrumental threat occurs when a measuring instrument changes with time. Observers should not change their standards between two measures.

What are the three big threats for internal validity in experiments?

Even if you add control groups, there can still be threats to the internal validity of your experiment. Three of these threats are observer, placebo and demand characteristics. An observer bias occurs when the expectations of the researcher influence his interpretation. An observer bias can also form a threat for the construct validity. Demand characteristics form a problem when participants think they know what the study is about and change their behaviour accordingly. To prevent observatory bias and demand characteristics, researchers should conduct studies that are double-blind. This means that both participants and researchers who evaluate them do not know in which conditions the participants are. When a double-blind research is not possible, researchers can conduct a masked design study. This means that participants know in which condition they are, but the researcher does not know in which group the participant is. Placebo effect occurs when participants receive a treatment and become better, because they think that they have received a real treatment. Placebo effects are not imagined and researchers have shown that placebo effects can be psychological and physical. Placebo effects are not always positive. You often hear that people become less depressed because they think that they have received a pill that has cured them. However, placebo effects can have nasty side-effects, like rash and headaches. In order to prevent placebo effects, it is good to conduct double-blind placebo control studies. In those studies, both participants and researchers who give the pill do not know in which condition the participant is in.

What do null effects mean?

Null effect means that the independent variable does not have an influence on the dependent variable and it occurs quite frequently. Null effect can also occur because the research was not conducted or set up accurately. The independent variable might influence the dependent variable, but because of some obscure factor, the researchers could not find any differences. The obscure factor can come in two forms: there was not enough difference between groups or there was too much variance within groups.

What happens when there is not enough difference between groups?

Bad manipulations, insensitive measures and reversion design confounds can cause insufficient differences between groups. When a null effect occurs, a researcher has to look to his data and how he has operationalized it. One must also look at the construct validity to test weak manipulations. Maybe different manipulation groups should have been made. Sometimes null effects are found because researchers have not operationalized a dependent variable with enough sensitivity. When different groups all score high on the dependent variable, we call that ceiling effects. When all groups score low on the dependent variable, we call this floor effects. Suppose you have three different groups of participants that all get the same test. What if that test is so difficult, that almost no one can get a good score? All participants will have a low score. That is because the test was too difficult and you can not say that the conditions of the independent variable influenced the test scores. A floor effect has occurred in that example. Manipulation checks can help spot weak manipulations.

What happens when there is much variety in groups?

This is called noise or error variance. Because of the great variability in a group, a true difference between groups might not be detected. Because there is so much variability in group A, participants from group A can be similar to participants of group B. This causes a statistical validity problem: the more groups overlap, the smaller the effect size and the less the means of the groups will differ statistically.

One reason for the great variability in a group is the measurement error. The measurement error is a factor that can enlarge or decrease the true score on the dependent variable of the participant. A man who is 1.80 meter tall, can be measured at 1.79 because he is not standing straight. All measures of dependent variables have a measurement error, but researchers try to keep this as low as possible. The more sources of random error are present in measures of a dependent variable, the more variability will be present in a group of participants. A measurement error can be decreased by the use of reliable and precise techniques and measures. When it is difficult to find a proper instrument, it is good to use more measures. More participants need to be put in the research. The more participants are present, the higher the chance that random errors will eliminate each other.

Individual differences can also cause variability in groups. One way to take this into account, is to use within-group designs. Every participant participates in both conditions of the independent variable. Such a setting will result in every person being his or her own comparison person. Individual differences will be eliminated that way. Within-group designs also need less participants. The same result can be found with matched groups. When there are two conditions, the researcher will match people who look like each other and compare the scores on the dependent variable for these people. When it is not possible to conduct a within-group design study or a matched group design study, researchers need to find more participants.

A third factor that can cause variability in groups is situation noise - every factor that causes variability in a group that can hide real differences. Researchers try to reduce situation noise by conducting the experiment in a calm place. So, they try to avoid places where you can hear people talking or places with distracting smell. Researchers often do their best to reduce potential distractors that could have an influence on the dependent variable. When researchers use a within-group design, they enhance the power. A study that has a lot of power finds true patterns more easily.

If you find a null effect, you have to see whether your manipulations were correct, if the variables were operationalized properly, whether there are measurement errors, were there enough participants and if you had enough control in the situation. When all those things and the power are good, then your null effect tells you that the independent variable really does not have an influence on the dependent variable.

The biggest threats for internal validity are design confounds, selection effects and order effects.

How to deal with experiments that have more than one independent variable? - Chapter 12

What are interactions in research?

Researchers might be interested in more than one independent variable from the beginning of the experiment or they might want to add additional variables after they had looked at the results. When researchers ask about the effect of an extra independent variable, they are usually interested in an interaction effect. This interaction effect looks at if the effect of the original independent variable depends on the level of other independent variable. An example of this is hands-free driving and reaction time. Researchers wanted to know if young people show a longer reaction time while driving and talking to someone on the phone (hands-free) than older people. Previous research has already shown that talking on the phone while driving, causes someone to react more slowly to obstacles on the road. In that research, there was just one independent variable (the use of a mobile phone). Then, researchers wanted to know if the effect depended on age. That became the second independent variable. An interaction effect can be explained as a difference of the differences.

Thoughts, behaviours, emotions and motivation of people are complicated. There are different interactions between them. Suppose you are asked if you like hot or cold food more. You will probably answer that it depends on the type of food. You want your soup to be hot, while you want your ice-cream to be cold. The food you have to judge is the independent variable and the temperature of the food is another independent variable. If you would put this into a graph, you would see an interaction. The two lines of the independent variables will cross each other. This type of interaction is called a crossover interaction. When the lines of the two independent variables are not parallel and do not cross each other, we speak about a spreading interaction. When there is an interaction present, you can carefully describe it through both directions. It does not matter what independent variable you put on the x-axis.

Which design can be used to test two variables?

Researchers use factorial design to test interactions. A factorial design is a design with two or more independent variables (called factors). Usually, the two independent variables are crossed. That means that researchers test every possible combination of the independent variable. In the example of talking on the phone while driving, there are two factors: age and phone use. When the two independent variables are crossed, there can be four conditions: old people who drive and phone, old people who drive and do not phone, young people who drive and phone, young people who drive and do not phone. There are two independent variables and each variable has two levels (young vs. old and phoning vs. not phoning). This design is therefore also called a 2 x 2 design. Factorial designs can be used to test manipulated (phoning or not phoning) and participant variables (age).

Can factorial designs be used to test limits and theories?

Factorial designs are used to test if independent variables can influence different people or if they can influence people in the same way in different situations. The study about phoning while driving had also a factorial design. No interaction was found between the independent variables. That means that there was no difference in reaction time with or without calling between young and old drivers. Testing limits in research looks like testing external validity. When an independent variable is tested in more than one group, researchers test whether the effect generalizes. In the example of reaction time and phone use, both groups react the same. The effect generalizes to all drivers of all ages. Of course, there are studies in which groups react differently on independent variables. When you use factorial designs to test limits, you are also looking for moderators. A moderator is a variable that influences the relationship between the independent variable and the dependent variable. A moderator results in an interaction. Factorial designs are not just used to test the generalizability of a variable, but also to test theories.

How can factorial results be interpreted?

In an analysis with two independent variables, you can inspect three things: two main effects and interaction effect. You must look at the effect of every independent variable. Those are the main effects. The marginal mean is the average of a factor, averaged over the levels of the other independent variable. Researchers look at marginal means to study the main effects and they use statistics to study whether the difference in marginal means is statistically significant. You should not think that the word ‘main effect’ means that it is the most important effect. It is better to see it as an overall effect. When an interaction is present, the interaction effect is the most important. Main effects are differences and an interaction effect is the difference of the differences. If you look at the difference of the levels of every independent variable and you see that these differences differ from each other, then you might assume there is an interaction. With the help of statistics, you could find out if this difference is statistically significant or not. In figures, interactions are easier to detect. When the lines in a graph are parallel, then there is probably no interaction and when they are not parallel, then there is an interaction. Of course, you have to confirm it with statistics. In a bar graph, you could also spot interactions. When you draw lines from the same levels and these lines do not happen to be parallel, you could assume there is an interaction. When both main effects and interaction effect are present, you should pay more attention to the interaction effect.

What factorial varieties exist?

We discussed 2 x 2 design, but researchers can also choose an independent variable that has more than two levels or they can choose three independent variables. In an independent groups factorial design/between-subjects design, both independent variables are studied as independent groups. If the study is about a 2 x 2 factorial design, there are four different groups of participants in the experiment. In a within-groups factorial design/repeated measures design, both independent variables are manipulated within groups. When it is a 2 x 2 factorial, there is one group of participants and all these participants take part in all four cells of the design. In a mixed factorial design, an independent variable is manipulated as an independent group and the other independent variables are manipulated as a within-group.

What happens when the number of levels or independent variables increases?

When one of the independent variables has three levels and the other independent variable has two, we call it a 2 x 3 design. There will be 2 x 3 = 6 cells. Of course, there are more combinations for designs. When independent variables have more than two levels, researchers can look at the main effects and interaction effects by calculating the marginal means and determining if these differ from each other. The most easy way is to make a line graph in SPSS and to see if the lines are parallel. Of course, you have to check if the effect is significant. When researchers add a third independent variable and all independent variables have two levels, we call this a 2 x 2 x 2 factorial design, or a three-way design. In this design, there are 2x2x2 = 8 conditions. The best way to depict this design, is to compute the table of your original 2 x 2 study twice. Once for each level of the third independent variable. You could also depict it by showing two line diagrams next to each other. In a three-way design, there can be three main effects and two interaction effects or one big three-way interaction. A three-way interaction means that the two-way interaction between two of the independent variables depends on the level of the third independent variable.

How can you find out about factorial designs in magazines?

In empirical articles researchers usually state which design they have used. They often use terms as 2 x 2 or 2 x 3. These numbers show how many independent variables are present and how many levels each variable has. Empirical studies also use terms like ‘main effect’ and ‘interaction.’ Popular articles in magazines or newspapers usually do not state which design has been used. However, there are some clues that can show whether a factorial design has been used or not. You can look at the word ‘it depends on…’. This shows that a certain effect depends on the level of the other variable. You can also conclude factorial designs have been used when participant variables have been used.

Researchers might be interested in more than one independent variable from the beginning of the experiment or they might want to add additional variables after they had looked at the results. When researchers ask about the effect of an extra independent variable, they are usually interested in an interaction effect. This interaction effect looks at if the effect of the original independent variable depends on the level of other independent variable. An example of this is hands-free driving and reaction time. Researchers wanted to know if young people show a longer reaction time while driving and talking to someone on the phone (hands-free) than older people. Previous research has already shown that talking on the phone while driving, causes someone to react more slowly to obstacles on the road. In that research, there was just one independent variable (the use of a mobile phone). Then, researchers wanted to know if the effect depended on age. That became the second independent variable. An interaction effect can be explained as a difference of the differences.

What are quasi-experiments? - Chapter 13

A quasi-experiment differs from actual experiment in control. In a quasi-experiment researchers do not have full control over the conditions. Participants are not randomly assigned to the conditions. An example of a quasi-experiment:

Plastic surgery is done throughout the whole world. People who undergo this procedure state that their self-esteem and body image will get better after the procedure. But is that really the case? One way to find out is to randomly assign people to the plastic surgery or no-plastic surgery condition. This, of course, is not ethical, because you can not tell people to get plastic surgery just for your experiment. The researchers asked people who were already going to get plastic surgery to answer questions about their self-esteem. These people were measured on their measure of self-esteem before the surgery and 3, 6 and 12 months after the plastic surgery. The comparison group was a group of people who were also registered at the same clinic, but who had not undergone plastic surgery. They also answered questions about their self-esteem at the same time points as the first group. Participants have not been assigned randomly to a condition - so quasi-experiment.

What about the internal validity of quasi-experiments?

The support a quasi-experiment can offer causal claims depends on design and results. A selection effect of internal validity occurs when the groups of different levels of the independent variable contain different types of people. In that way, you can not say for certain that the independent variable has caused a change in the dependent variable. The participants who had done surgery procedures could have been different types of people than the participants who had not had surgery done. The results actually showed that the people who had the surgery done were richer than the other people. However, this study looked like a pretest-posttest design and this counterbalanced the selection effects. Matched groups can also be used to compare the two groups of participants. Some researchers use a wait-list design, in which all participants get a treatment, but at different moments in time.

There are more problems that can occur in quasi-experiments. Problems with the design can also occur. A design confound occurs when a third variable varies systematically within a level of the independent variable of interest. By collecting additional data, you can assure that there are no design confounds. A maturation threat occurs when participants with a pretest and posttest show an increase in their scores, but it is not clear if the change is caused by treatment or because the group has improved. With a control group, it is easier to state whether a change is caused by a treatment or spontaneously. A historical threat occurs when a historic event takes place for all participants at the same time. The effects of historic threats can be decreased with the use of comparison groups.

Regression to the mean occurs when an extreme result is caused by a combination of random factors that probably will not occur again in the same combination. The extreme result will become less extreme in time. Regression effects will only form a threat for internal validity if the group is selected because of an extremely high or low score. Attrition occurs when people do not want to participate in the research anymore after a while. It is a threat to internal validity if people leave because of a systematic reason. It is possible that the least satisfied people left the research about plastic surgery. The result that plastic surgery enhances the self-image can only be accounted for the fact that the satisfied participants stayed in the research. It is easy to check for attrition. You only have to look at whether the people who have left, are systematically equal.

When participants have been tested multiple times, researchers have to be careful with test effects. Repeated testing can result in people becoming better or worse in the test. Researchers sometimes use different, but equal tests. They have to be sure that the there is no difference in difficulty between these tests, otherwise you can not say if the change is caused by treatment or difference in difficulty. Another threat for internal validity of quasi-experiment, is the observer bias. Sometimes the expectations of the researcher can influence his interpretations of the results. Participants might also think they know what the study is about and change their behaviour accordingly.

Why would one use quasi-experiments?

Quasi-experiments can be sensitive to threats of internal validity. One of the reasons for using quasi-experiments, is because they can be conducted using real-world situations. These real settings enable a better external validity and the results can be generalized to the population. Quasi-experiments can also be used when one is concerned about the ethical problems of an actual experiment. Some things can only be studied with quasi-experiments. Quasi-experiments also show good construct validity of the independent variables.

How can research with a small number of participants be conducted?

It is more important for the external validity to select participants through random selection than to have many people in a sample. When researchers use a small N-design, instead of gathering little information from a big sample, they gather much information from a small one. They can even look at one animal or one human in so-called single N-designs. In large N-designs, participants are put in groups and the data from one single person is not interesting - it is presented as a group average. In small N-designs, every individual is treated as an experiment condition on its own. Usually, these designs are repeated measures in which researchers observe how an animal or human reacts in different conditions and situations. The data of individuals is presented in small N-designs.

Which are the three different small N-designs?

Well-developed small N-design studies can help researchers find out whether changes have occurred because of interventions or because of the influence of another variable.

In a stable-baseline design, researchers observe the behaviour for a long baseline period before they start a treatment or intervention. If the behaviour during the baseline is stable, researchers can more easily say that a treatment is effective. A stable baseline helps the internal validity. In a multiple-baseline design, researchers spread their introduction of interventions over different times, contexts and situations. Research can enhance the internal validity by looking at more baselines and behaviours and this will also support causal conclusions. Different baselines can be different behaviours within a person or different situations for a person. The baseline conditions can also be different people. It does not matter what the form of a multiple-baseline design is, it offers a comparison group with which the treatment group can be compared.

In a reversal design, a researcher observes problem behaviour with and without treatment, but he then takes the treatment away (reversal period) to see whether the problematic behaviour returns. If the treatment really works, then the behaviour should get worse again when the treatment is taken away. In that way, internal validity is tested and causal inferences can be made. Reversal designs are only appropriate in situations in which the treatment has not caused lasting changes. It is not always ethical to take away a treatment from a person (like taking away treatments from depressed people).

What about the four validities in small N-designs?

Is one human able to represent an entire population (external validity)? Researchers can take additional steps to enlarge external validity. They can use triangulation by combining the results of the small or single N-designs with studies that used more participants. Sometimes researchers are not interested in generalizing to the entire population. For construct validity, in small N-designs it is important that there is more than one observer and the interrator reliability is checked. In small N-designs, traditional statistical methods are not used. However, conclusions have to be drawn from the data and the data has to be handled in an appropriate way.

A quasi-experiment differs from actual experiment in control. In a quasi-experiment researchers do not have full control over the conditions. Participants are not randomly assigned to the conditions. An example of a quasi-experiment:

Plastic surgery is done throughout the whole world. People who undergo this procedure state that their self-esteem and body image will get better after the procedure. But is that really the case? One way to find out is to randomly assign people to the plastic surgery or no-plastic surgery condition. This, of course, is not ethical, because you can not tell people to get plastic surgery just for your experiment. The researchers asked people who were already going to get plastic surgery to answer questions about their self-esteem. These people were measured on their measure of self-esteem before the surgery and 3, 6 and 12 months after the plastic surgery. The comparison group was a group of people who were also registered at the same clinic, but who had not undergone plastic surgery. They also answered questions about their self-esteem at the same time points as the first group. Participants have not been assigned randomly to a condition - so quasi-experiment.

How to apply the results of a study to the real world? - Chapter 14

What is replicability?

Scientists should always ask themselves whether their results are replicable. That means that the findings, if the study is conducted again, show the same results. Replicability gives credibility to the study. Usually researchers replicate their results before publishing their findings. In direct replications, researchers repeat the original study as accurately as possible. They try to find out if the original effect can be found with new data. In a conceptual replication, researchers study the same research question but they use different procedures. Variables are operationalized differently. Research on the size of portions can use pasta in the first study and chips in the replication study. In a replication-plus-extension study, researchers replicate the original study, but they also add variables to test more questions. An example of this was the study on reaction time and phoning while driving. Firstly, the researchers just looked at if the reaction time changed while phoning, then they looked whether there was a difference between young and old drivers. Introducing a participant variable can be used to conduct a replication-plus-extension study. Another way to conduct such a study is by introducing a new situational variable. With this variable you can compare the data from one time point to data from another time point. Researchers can test drivers who have not had a training with a driving simulator and test the same people again four days after having practiced with a driving simulator. There are many different situational variables you can think of to add in a study. When it is not possible to replicate a study, it could mean that the original effect may only be found in special conditions. The importance of this effect should be weighted carefully.

What does the literature say about meta-analyses?

Scientific literature consists of a series of related studies conducted by different researchers and testing similar variables. Sometimes researchers collect all studies on a specific topic and they turn it into a review article. One way of writing this review article is by reciting all the findings. Another way is to make a mathematical summary of the scientific literature. That is called a meta-analysis. In a meta-analysis, studies that have different sample sizes are admitted. It is usually the case that studies with large sample size are more reliable in the analysis. In a meta-analysis, the effect sizes are averaged to find an overall effect size. Researchers can also sort a group of studies in categories and calculate the effect sizes of these categories. However, you should be aware of the publication error in psychology. That means that significant relations are published more often than null effect relations. This can lead to the file drawer problem - meta-analysis overestimating the true size of an effect, because null effects have not been included. It would be a good idea for scientists who want to conduct a meta-analysis to contact their colleagues and ask them for published and unpublished data. Meta-analysis is as strong as the data in it. You must be aware of unpublished studies as well.

Does an important study need to have external validity?

Direct replication studies do not support external validity, but conceptual replication and replication-plus-extension studies can support external validity. When different methods are used to test the same thing, researchers can choose to use other participants or other settings. It is also more important to know how participants have been assembled than how many participants have been assembled. The similarity between the context of a study and the real world is called ecological validity. Ecological validity is an aspect of external validity. It depends on the researchers’ goal of how important ecological validity is. If researchers want to use their theory only for men, their results do not have to be generalizable to women. The same goes for causal claims. In the theory-test modus, researchers only want to test an association that might support their theory. In that case, it is more important to test internal validity than external validity. An example is the comfort contact theory - the researchers found more important to test the internal validity than the external one.

Psychologists are also interested in working in a generalizable modus. These psychologists want to generalize the results of their samples to a bigger population. Applied research is more often done in the generalizable modus. Frequency claims should always be tested in the generalizable modus. Association and causal claims are often conducted in the theory-test modus, but they can also be conducted in the generalizable modus. Cultural psychologists are interested in how a culture influences the way of thinking, feelings and behaving of individuals. Cultural psychologists often use the generalizable modus. They have shown that many theories that have been supported in one cultural context, are not supported in another culture. This goes for the Müller-Lyer illusion (two types of lines that do not appear to be of the same length, have in fact the same length). It appears that falling for visual illusions depends on the culture you have grown up in. People who have grown up in developed countries have more experience with straight angles and have better depth perception than Africans. Psychologists should always be aware that even basic processes can be influenced by culture. Most studies have been conducted with participants from the US, Australia and Europe. These participants are called the WEIRD participants: western, educated, industrialized, rich and democratic. These WEIRD people do not represent the whole world.

Should research be conducted only in real settings?

Studies that have been conducted in the real world, have a good external validity. However, the ecological validity of a setting is only one aspect of the generalizability of the setting. A setting may be realistic, but it does not represent all settings a person may encounter. Researchers usually make a laboratory setting as similar as the real life setting. Emotions and behaviours shown during the laboratory studies can be as real and intense as they are in real life. Many laboratory experiments are high in experimental realism. That means that they create settings in which people show real emotions, motivations and behaviours. By enhancing the ecological validity of a study, researchers can assure that their findings are generalizable to non-laboratory settings. Studies conducted in the theory-test modus, find the internal validity as most important, even if it comes at the expense of the external validity. External validity is not everything.

Scientists should always ask themselves whether their results are replicable. That means that the findings, if the study is conducted again, show the same results. Replicability gives credibility to the study. Usually researchers replicate their results before publishing their findings. In direct replications, researchers repeat the original study as accurately as possible. They try to find out if the original effect can be found with new data. In a conceptual replication, researchers study the same research question but they use different procedures. Variables are operationalized differently. Research on the size of portions can use pasta in the first study and chips in the replication study.

Join World Supporter
Join World Supporter
Log in or create your free account

Why create an account?

  • Your WorldSupporter account gives you access to all functionalities of the platform
  • Once you are logged in, you can:
    • Save pages to your favorites
    • Give feedback or share contributions
    • participate in discussions
    • share your own contributions through the 7 WorldSupporter tools
Follow the author: Psychology Supporter
Promotions
verzekering studeren in het buitenland

Ga jij binnenkort studeren in het buitenland?
Regel je zorg- en reisverzekering via JoHo!

Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
[totalcount]
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.