Summary with the 4th edition of Business Research Methods by Blumberg


How does sampling work? - Chapter 6

What are the reasons of sampling?

The unit of analysis depicts the level at which the research is performed and which objects are researched. The essential application of sampling is that it allows for drawing conclusions about the entire population, by studying some of the elements in a population. A population element is the unit of study: the individual participant or object on which the measurement is taken. A population is the total collection of elements about which some conclusion is to be drawn. A census is a count of all the elements in a population. The listing of all population elements from which the sample will be drawn is called the sample frame.

There are several compelling reasons for sampling:

  1. Lower cost: the difference between the sample costs and census costs is substantial.

  2. Greater accuracy of results: some argue that the quality of a study is often better with sampling than with a census. However, when the population is small, accessible, and highly variable, accuracy is expected to be greater with a census than a sample.

  3. Greater speed of data collection: the time between the recognition of a need for information and the availability of that information is reduced.

  4. Availability of population elements: Some situations simply require sampling. This is the case where e.g. the population is and infinite conditions are appropriate for a census study.

The advantages of sampling over census studies are less compelling when the population is small and the variability within the population high. A census study is:

  • Feasible when the population is small.

  • Necessary when the elements are quite different from each other.

However, when the population is small and variable, any sample we draw may not be representative of the population from which it is drawn. The resulting values we calculate from the sample are incorrect as estimates of the population values.

What does validity consist of?

The ultimate test of a sample design is how well it represents the characteristics of the population it claims to represent. Thus, the sample must be valid. Validity of a sample depends on two considerations:

  1. Accuracy.
  2. Precision.

Accuracy

Accuracy is the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behaviour, attitudes or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. Also, the measure of the behaviour, attitudes, or knowledge of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value.

Thus, an accurate (unbiased) sample is one in which the under estimators offset the over estimators.

Systematic variance has been defined as “the variation in measures due to some known or unknown influences that ‘cause’ the scores to lean in on direction more than another.” The systematic variance may be reduced by for example increasing the sample size.

Precision

Precision of estimate is the second criterion of a good sample design. In order to interpret the findings of research, a measurement of how closely the sample represents the population is needed. The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations natural to the sampling process. This is called sampling error (or random sampling error) and reflects the influence of chance in drawing the sample members.

Sampling error is what is left after all known sources of systematic variance have been accounted for. Precision is measured by the standard error of estimate, a type of standard deviation measurement: the smaller the standard error of estimate, the higher is the precision of the sample. The ideal sample design produces a small standard error of estimate.

What approaches of sample design are there?

Two approaches of sample design are:

  1. Different decisions researchers have to make.
  2. Different types of sampling designs.

Representation

The members of a sample are selected using probability or non-probability procedures. Non-probability sampling is arbitrary and subjective: when elements are chosen subjectively, there is usually some pattern or scheme used. Thus, each member of the population does not have a known chance of being included.

Element selection

Whether the elements are selected individually and directly from the population (viewed as a single pool) or additional controls are imposed, element selection may also classify samples. If each sample element is drawn individually from the population at large, it is an unrestricted sample. Restricted sampling covers all other forms of sampling.

Probability sampling

Probability sampling is based on the concept of random selection: a controlled procedure that assures that each population element is given a known nonzero chance of selection. Only probability samples provide estimates of precision and offer the opportunity to generalize the findings to the population of interest from the sample population. The unrestricted, simple random sample is the simplest form of probability sampling. Since all probability samples must provide a known non-zero chance of selection for each population element, the simple random sample is considered a special case in which each population element has a known and equal chance of selection. In this section, we use the simple random sample to build a foundation for understanding sampling procedures and choosing probability samples.

What steps does a sampling design contain?

There are several questions to be answered in securing a sample. Each question requires unique information.

  1. What is the target population? Good operational definitions are critical in choosing the relevant population.

  2. What are the parameters of interest? Population parameters are summary descriptors (e.g. incidence proportion, mean, variance) of variables of interest in the population. Sample statistics are descriptors of those same relevant variables computed from sample data. Sample statistics are used as estimators of population parameters. The sample statistics are the basis of conclusions about the population. Depending on how measurement questions are phrased, each may collect a different level of data. Each different level of data also generates different sample statistics. The population proportion of incidence “is equal to the number of elements in the population belonging to the category of interest, divided by the total number of elements in the population”. Proportion measures are necessary for nominal data and are widely used for other measures as well. The most frequent proportion measure is the percentage.

  3. What is the sampling frame? The sampling frame is closely related to the population. It is the list of elements from which the sample is actually drawn. Ideally, it is a complete and correct list of population members only. A too inclusive frame is a frame that includes many elements other than the ones in which the researcher is interested.

  4. What is the appropriate sampling method? A researcher must follow an appropriate method and make sure that interviewers (or others) cannot modify the selections made and only the selected elements from the original sampling are included.

  5. What size sample is needed? Some principles that influence sample size include:

  • The narrower or smaller the error range, the larger the sample must be.

  • The greater the dispersion or variance within the population, the larger the sample must be to provide estimation precision.

  • The higher the confidence level in the estimate, the larger the sample must be.

  • The greater the desired precision of the estimate, the larger the sample must be.

  • The greater the number of subgroups of interest within a sample, the greater the sample size must be, as each subgroup must meet minimum sample size requirements.

  1. How much will it cost? Also the costs for each and every experiment have to be taken into consideration, since money is often the factor which limits most of the research.

What kinds of probability sampling are there?

Simple random sampling

Since all probability samples must provide a known nonzero probability of selection for each population element, the simple random sample is considered a special case in which each population element has a known and equal chance of selection. However, Simple random sampling is often impractical, i.e. it requires a population list (sampling frame) that is often not available; and it fails to use all the information about a population, thus resulting in a design that may be wasteful. It may also be expensive to implement. Therefore alternative probability sampling approaches such as, systematic sampling, stratified sampling, cluster sampling and double sampling, will be considered.

Systematic sampling

In this approach, every kth element in the population is sampled, beginning with a random start of an element in the range of 1 to k. The kth element, or skip interval, is determined by dividing the sample size into the population size to obtain the skip pattern applied to the sampling frame.

K = skip interval = total population size / size of the desired sample.

The major advantage of systematic sampling is its simplicity and flexibility. A concern with systematic sampling is the possible periodicity in the population that parallels the sampling ratio. Another difficulty may arise when there is a monotonic trend in the population elements. That is, the population list varies from the smallest to the largest element or vice versa.

Stratified sampling

Most populations can be segregated into several mutually exclusive sub populations, or strata. A stratified random sampling is the process by which the sample is constrained to include elements from each of the segments. After a population is divided into the appropriate strata, a simple random sample can be taken within each stratum. The results from the study can then be weighted (based on the proportion of the strata to the population) and combined into appropriate population estimates.

A stratified random sample is often chosen in order to:

  • Increase a sample’s statistical efficiency.

  • Provide adequate data for analysing the various sub populations or strata.

  • Enable different research methods and procedures to be used in different strata.

Stratification is usually more efficient statistically than simple random sampling and at worst is equal to it. With the ideal stratification, each stratum is homogeneous internally and heterogeneous with other strata. Also, the more strata used, the closer a researcher comes to maximizing interstrata differences (differences between strata) and minimizing intrastratum variances (differences within a given stratum). The size of the strata can be computed with the following pieces of information:

  • How large the total sample should be.

  • How the total sample should be allocated among strata.

Proportionate stratified sampling

In proportionate stratified sampling, each stratum is properly represented so that the sample size drawn from the stratum is proportionate to the stratum’s share of the total population. This approach has higher statistical efficiency than a simple random sample and is much easier to carry out than other stratifying methods.

It also provides a self-weighting sample; the population mean or proportion can be estimated simply by calculating the mean or proportion of all sample cases, eliminating the weighting of responses. On the other hand, proportionate stratified samples often gain little in statistical efficiency if the strata measures and their variances are similar for the major variables under study. Any stratification that departs from the proportionate relationship is disproportionate.

Cluster sampling

This is where the population is divided into groups of elements with some groups randomly selected for study. Two conditions foster the use of cluster sampling:

  1. The need for more economic efficiency than can be provided by simple random sampling.

  2. The frequent unavailability of a practical sampling frame for individual elements.

Statistical efficiency for cluster samples is usually lower than for simple random samples, mainly because clusters often don’t meet the need for heterogeneity and, instead, are homogeneous.

An area sampling is the most important form of cluster sampling. It is possible to use when a research involves populations that can be identified with some geographic area. This method overcomes the problems of both high sampling cost and the unavailability of a practical sampling frame for individual elements. In designing cluster samples, including area samples, the following questions should be answered:

  • How homogeneous are the resulting clusters? When clusters are homogeneous, this contributes to low statistical efficiency. Sometimes one can improve this efficiency by constructing clusters to increase intracluster variance.

  • Shall equal-size or unequal-size clusters be sought for? A cluster sample may be composed of clusters of equal or unequal size. The theory of clustering is that the means of sample clusters are unbiased estimates of the population mean. This is more often true when clusters are naturally equal, such as households in city blocks. While one can deal with clusters of unequal size, it may be desirable to reduce or counteract the effects of unequal size".

  • How large a cluster should be taken? Comparing the efficiency of differing cluster sizes requires that the different costs for each size are discovered and that the different variances of the cluster means are estimated.

  • Shall a single-stage or multistage cluster be used? Concerning single-stage or multistage cluster design, for most large-scale area sampling, the tendency is to use multistage designs. Several situations justify drawing a sample within a cluster, in preference to the direct creation of smaller clusters and taking a census of that cluster using one-stage cluster sampling.

  • How large a sample is needed? It depends mainly on the specific cluster design.

Double sampling

It may be more convenient or economical to collect some information by sample and then use this information as the basis for selecting a subsample for further study. This procedure is called double sampling, sequential sampling, or multiphase sampling. It is usually found with stratified and/or cluster designs.

What sorts of non probability sampling are there?

With a subjective approach like non-probability sampling, the probability of selecting population elements is unknown. There are a variety of ways to choose persons or cases to include in the sample. A greater opportunity for bias to enter the sample selection procedure and to distort the findings of the study exists. Any range within which to expect the population parameter cannot be estimated. There are some practical reasons for using the less precise methods.

What methods are used for sampling?

Convenience

Non-probability samples that are unrestricted are called convenience samples. They are the least reliable design but normally the cheapest and easiest to conduct. Researchers or field workers have the freedom to choose whomever they find.

Purposive sampling

A non-probability sample that conforms to certain criteria is called purposive sampling. The two major types are judgment sampling and quota sampling:

  • Judgment sampling occurs when a researcher selects sample members to conform to some criterion. When used in the early stages of an exploratory study, a judgment sample is appropriate. When one wishes to select a biased group for screening purposes, this sampling method is also a good choice.

  • Quota sampling is the second type of purposive sampling. It is used to improve representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. If a sample has the same distribution on these characteristics, then it is likely to be representative of the population regarding other variables on which the researcher has no control. In most quota samples, researchers specify more than one control dimension. Each should meet two tests: (1) It should have a distribution in the population that can be estimated, and (2) be pertinent to the topic studied.

Snowball

In the initial stage of snowball sampling, individuals are discovered and may or may not be selected through probability methods. This group is then used to refer the researcher to others who possess similar characteristics and who, in turn, identify others.

Eventually sampling on the internet has significantly increased in the past decades and almost every firm uses the Internet to conduct research.

What is the most important information from this chapter?

The unit of analysis describes the level at which the research is performed and which objects are reached. A population element is the subject on which the measurement is being taken. A population is the total collection of elements about which we wish to make some inferences. A census is a count of all the elements in a population.

There are a couple of reasons for sampling:

  • Lower cost.

  • Greater accuracy of results.

  • Greater speed of data collection.

  • Availability of population elements.

There are two conditions for a census study, namely that it should be feasible when the population is small and that it is necessary when the elements are quite different from each other. In order for a sample to be appropriate, it has to be accurate and precise. With regards to accuracy, there should be no systematic variance within a sample. It is the ‘’variation in measures due to some known or unknown influences that ‘’cause’’ the scores to lean in one direction more than another.

Probability sampling is a controlled procedure that ensures that each population element is given a non-zero change of selection. Non-probability sampling is arbitrary and subjective. A simple random sample is the easiest form of probability sampling. It is known as a special case in which every population element has a known and equal chance of selection.

Population parameters are summary descriptors of variables of interest in the population. Sample statistics are descriptors of the relevant variables computed from sample data. The population proportion of incidence is ‘’equal to the number of elements in the population belonging to the category of interest, divided by the total number of elements in the population’’. In addition to figuring out what the parameters of interest are, it is also important for researchers to find out about the relevant population, sampling frame, sample size and costs.

The standard error of the mean is a measure of the standard deviation of the distribution of sample means.

Systematic sampling is an approach in with every nth element in the population is sampled, starting with a random start of an element in the range of one to n. The nth element is determined by dividing the sample size into the population size to obtain the skip pattern applied to the sampling frame.

Stratified random sampling is the process by which the sample is constrained to include elements from each of the segments. Proportionate stratified sampling is a way of sampling in which every single stratum is neatly represented in such a way that the sample drawn from it is proportionate to the stratum’s share of the total population. Disproportionate stratified sampling is a way of sampling in which any stratification that departs from the proportionate relationship is disproportionate.

Cluster sampling is a way of sampling in which the population can also be divided into groups of elements with some groups randomly selected for study. Area sampling is a way of sampling that does not have problems of high sampling cost and the unavailability of a practical sampling frame for individual elements. Simple-cluster sampling is a way of sampling in which only single-stage samples with equally-sized clusters are treated.

Double sampling, sequential sampling or multi-phase sampling are ways of sampling in which some information is collected by sample and then used as the bases for selecting a sub-sample for further study, because this is more convenient.

Convenience samples are samples that are non-probability and unrestricted. Purposive sampling is also a type of non-probability sampling, but one that conforms to certain criteria. There are two types:

  1. Judgment sampling: occurs when a researcher selects sample members to conform to some criterion.

  2. Quota sampling: Improves representativeness. In this type there are usually more than one control dimensions, and each one of them should have a distribution in the population that can be estimated, and should be pertinent to the topic that is being studied.

Snowball sampling is a way of sampling in which individuals are discovered in the initial stage, and may/may not be selected through probability methods. This type of sampling can be very useful if the aim is to sample subjects that are very difficult to identify, as they are nowhere registered as a population.

When and how is survey research conducted? - Chapter 7

What are the different types of data collection?

There are two different types of data collection methods: the observation approach and the communication approach. The observation approach involves observing conditions, behaviour, events, people or processes. The communication approach involves surveying/interviewing people and recording their response for analysis. Communicating with people about various topics, including participants, attitudes, motivations, intentions and expectations.

The researcher determines the data-collection approach by:

  • Identifying the types of data needed.

  • Investigate questions the researcher must answer.

  • The desired data type (nominal/ordinal/interval/ratio).

  • The characteristics of a sample unit.

A survey is a measurement process used to collect information during a highly structured interview, sometimes with a human interviewer and other times without. The goal of the survey is to obtain comparable data across subsets of the chosen sample so that similarities and differences can be found. A well-chosen question can yield information that would take much more time and effort to gather by observation.

To obtain comparable data:

  • Questions are carefully chosen or crafted.

  • Questions are sequenced.

  • Questions are precisely asked of each participants.

How do you choose your method of communication?

A survey that uses telephone, mail or the Internet as the medium of communication can expand geographic coverage at a fraction of the cost and time required by observation. When combined with statistical probability sampling for selecting participants, survey findings and conclusions can be applied to large and diverse populations.

The strength of a survey as a primary data-collection approach is its versatility: it does not require there to be a visual or other objective perception of the information sought by the researcher.

There are three major sources of error in communication research:

  1. Measurement questions and survey instruments.

  2. Interviewers.

  3. Participants.

A research may become useless if the researcher will:

  • Select or craft inappropriate questions.

  • Ask questions in an inappropriate order.

  • Use inappropriate transitions and instructions to obtain information.

For a survey to succeed, three conditions must be met by participants:

  1. The participant must possess the information being targeted by the investigative questions.

  2. The participant must understand his or her role in the interview as the provider of accurate information.

  3. The participant must have adequate motivation to cooperate(=participant receptiveness)

To increase the motivation of the participants it is important to establish a friendly relationship with the participant. Hereby, you will avoid two kind of errors: Whether they respond (willingness to respond) and how they will respond.

Three factors that will help with participant receptiveness:

  1. the participant must believe that the participation experience will be pleasant and satisfying

  2. the participant must believe that answering the survey is an important and worthwhile use of his/her time.

  3. the participant must dismiss any mental reservations that he/she might have about participation.

What errors occur due to participants?

Participants cause error in two ways: Whether they respond (willingness) and how they respond. To avoid participant-based errors:

Dealing with non-response error - By failing to respond or refusing to respond, participants create a non-representative sample for the study overall or for a particular item or question in the study.

In surveys, non-response error occurs when the responses of participants differ in some systematic way from the responses of nonparticipants.

This occurs when:

  • The researcher cannot locate the person (the pre-designated sample element) to be studied.

  • The researcher is unsuccessful in encouraging that person to participate.

Solutions to reduce errors of non-response:

  • Establishing and implementing call- back procedures.

  • Creating a non-response sample and weighting results from this sample.

  • Substituting another individual for the missing participant.

Response errors: occur during the interview (created by either the interviewer or participant) or during the preparation of data for analysis.

Participant-initiated error: when the participant fails to answer fully and accurately – either by choice or because of inaccurate or incomplete knowledge

The interviewer can do little about the participant’s information level. The most appropriate applications for communication research are those where participants are uniquely qualified to provide the desired information.

What errors occur due to the interviewer?

Interviewer error: response bias caused by the interviewer. Errors:

  • Failure to secure full participant cooperation (sampling error). The sample is likely to be biased if interviewers do not obtain participant cooperation.

  • Failure to record answers accurately and completely (data entry error). It is possible that, when the interviewer records a procedure that forces him to summarize or interpret participant answers or he has insufficient space to record answers accurately, the data will be biased.

  • Failure to consistently execute interview procedures. The precision of survey estimates will be reduced and there will be more error around estimates to the extent that interviewers are inconsistent in ways that influence the data.

  • Failure to establish appropriate interview environment. Answers may be systematically inaccurate or biased when interviewers fail to appropriately train and motivate participants or fail to establish a suitable interpersonal setting.

  • Falsification of individual answers or whole interviews. Surveying is difficult work, often done by part-time employees, usually with only limited training and under little direct supervision. At times, a falsification of an answer to an overlooked question happens, as an interviewer may put his own answer in the blank space.

  • Inappropriate influencing behaviour. An interviewer can distort the results of any survey by inappropriate suggestions, directions, or verbal probes; by word emphasis and question rephrasing; by tone of voice; or by body language, facial reaction to an answer, or other nonverbal signals.

  • Physical presence bias. Sometimes young people modify their responses during an interview conducted by an older person, whom might be perceived as an authority.

Participants also cause error by responding in such a way as to unconsciously or consciously misrepresent their actual behaviour, attitudes, preferences, motivations, or intentions (= response bias).

Participants create response bias when they modify their responses to be socially acceptable or to save face or reputation with the interviewer (social desirability bias), and sometimes even in an attempt to appear rational and logical.

One major cause of response bias is acquiescence = the tendency to be agreeable.

In order to reduce response errors, a researcher can use follow ups or reminders to increase the response rate.

In addition, there is evidence that advance notification, particularly by telephone, is effective in increasing response rates (preliminary notification).

Other concurrent techniques, such as appropriate questionnaire length, survey sponsorship, return envelopes and postage for mail surveys, personalization, cover letters and deadline dates also help to reduce the non-response error.

A researcher can conduct a semi structured interview or survey by personal interview or telephone or can distribute a self-administered survey by mail, fax, etc. Exhibit 7.5 provides an overview of the different communication approaches.

Telephone interviewing remains popular because of the dispersion of telephone service in households and the low cost of this method compared with personal interviewing. However, telephone interviewing also have some disadvantages.

Non contact rate is a ratio of potential but unreached contacts (no answer, busy, answering machine, and disconnects but not refusals).

The refusal rate refers to the ration of contacted participants who decline the interview to all potential contacts. Moreover, the telephone interview is limited to a certain length, limited in the use of visual or complex questions, the interviewee might hang up, and participants are less involved.

Disadvantages of telephone interviewing:

  • Inaccessible households.

  • Inaccurate or non-functioning numbers.

  • Limitation on interview length.

  • Limitations on use of visual or complex questions.

  • Ease of interview termination.

  • Less participant involvement.

  • Distracting physical environment.

Random dialing: requires choosing telephone exchanges or exchange blocks and then generating random numbers within these blocks for calling.

Computer-assisted telephone interviewing (CATI)

The self-administered questionnaire is the most popular type of surveys. Self-administered surveys can be delivered by mail, computer or can be intercept studies. Advantages of self–administered surveys include:

  • They typically cost less than surveys via personal interviews.

  • Mail surveys are typically perceived as more impersonal, providing more anonymity than the other communication modes, including other methods for distributing self-administered questionnaires.

Disadvantages of self-administered surveys include:

  • Researchers cannot expect to obtain large amounts of information and cannot deeply investigate the issue they want to;

  • Participant often refuse to cooperate with a long and/or complex mail or intercept questionnaire unless they perceive a personal benefit.

A survey via personal interview is a two-way conversation between a trained interviewer and a participant.

The main advantage lies in the depth of information and detail that can be secured. It far exceeds the information secured from telephone and self-administered studies. The interviewer can also do more things to improve the quality of the information received. However, this method is costly and time consuming, and its flexibility can result in excessive interviewer bias.

Computer-assisted personal interviewing (CAPI): special scoring devices and visual materials are used.

Intercept interview: targets participants in centralised locations, such as shoppers in retail malls. Reduce the costs associated with travel.

Lastly, outsourcing survey services offers special advantages to managers. A professionally trained research staff, centralized location interviewing, focus group facilities and computer assisted facilities are among them. Speciality firms offer software and computer based assistance for telephone and personal interviewing as well as for mail and mixed modes. Panel suppliers produce data for longitudinal studies of all varieties. However, it is highly significant to be very careful in selection interviewers and giving them appropriate training

Web-based survey: A web-based survey as the power of CATI systems, but without the expense of network administrators, specialised software or additional hardware.

Apart from asking people, it is also possible to observe them in order to get information from them. With this information research can be done, which is also referred to as the observational method. The following are examples of advantages:

  • The only method that is able to get information from people that cannot properly express themselves.

  • It reduces retrospective biases.

  • Independent of reports by others, thereby reducing the respondent bias.

The following are examples of limitations of the observations method:

  • Observation is a slow and costly process which requires human observers.

  • The observer has to be at the place of the event , yet it is difficult to predict when the event will take place.

  • Observation is limited as a way to learn about the past.

Structured observation is kind of similar to survey research, yet it differs in the sense that instead of asking respondents what they are doing, you observe what they are doing. There are two types:

  1. Direct observation, which happens when the observer is present physically and is monitoring personally what takes place.

  2. Indirect observation, which happens when the recording is done by mechanical, photographic or electronic means.

Observations can be aimed at behavioural and non-behavioural activities and conditions. The following belong to non-behavioural observation:

  • Record analysis.

  • Physical condition analysis.

  • Physical process analysis.

The following belong to behavioural observations:

  • Non-verbal analysis.

  • Linguistic analysis.

  • Extra-linguistic analysis.

  • Spatial analysis.

Observations at a factual level are direct descriptions of what is happening and what can be seen. Observations at an inferential level translate what is seen to a concept that cannot be observed.

What is the most important information from this chapter?

Whenever a study is qualitative, it often uses the term participant observation. Whenever a study is quantitative, it often uses the term structured observation.

The communication approach is an approach that involves surveying people and recording their responses for analysis. In order for a survey to deliver successful results, they should meet the following conditions:

  • The participant has to possess the information being targeted by the investigative questions.

  • The participant must understand his/her role in the interview as the provider of accurate information.

  • The participant must perceive adequate motivation to cooperate.

In order to establish a friendly relationship with the respondent, the following factors might be helpful:

  • The participant must believe that the participation experience will be pleasant and satisfying.

  • The participant must believe that answering the survey is an important and worthwhile use of his/her time.

  • The participant must dismiss any mental reservations that he/she might have about participation.

A non-response error happens when the responses of participants differ in some systematic way from the responses of non-participants. There are several solutions to this problem:

  • Establishing and implementing call-back procedures.

  • Creating a non-response sample and weighting results from this sample.

  • Substituting another individual for the missing participant.

A personal interview is a two-way conversation initiated by an interviewer to obtain information from a participant. An advantage of personal interviews is that the interviewers can make use of special scoring devices and visual materials, with the help of computer-assisted personal interviewing (CAPI).

The intercept interview aims at participants in centralized locations, for example shoppers in shopping centres. Probing is a technique whereby respondents are stimulated to answer more fully and relevantly. Therefore a probe should be neutral and seem as a part of the conversation.

A response error happens when the reported data differ from the actual data. A participant-initiated error happens when the respondent does not manage to answer fully and accurately, maybe because of his/her own choice, or because of inaccurate or incomplete knowledge. An interviewer error is a major source of response bias.

Telephone interviews are very much like personal interviews, except they are conducted through telephone. With telephone interviews, another way of securing immediate response is by making use of the computer-administered telephone survey.

Non-contact rate: The ratio of potential non-contacts to all potential contacts.

Refusal rate: The ratio of respondents who decline the interview to all potential/eligible contacts.

Random dialling procedures: A procedure for bypassing out-of-date directories that require choosing phone exchange blocks and then generating random numbers within these blocks for calling.

The self-administered questionnaire a very simple type of questionnaire that is usually just ‘’left behind’’ to be filled in by a respondent. Examples of these types of questionnaires are service evaluations of hotels. There are several advantages and disadvantages related to the following aspects of self-administered questionnaires:

  • Costs.

  • Sample accessibility.

  • Response time.

  • Anonymity.

  • Topic coverage.

  • Non-response error in mail and web-based surveys.

  • Reducing non-response error.

Web-based surveys are a special form of self-administered surveys. There are three types:

  1. Target web survey.

  2. Self-selected survey.

  3. Social-media-based survey.

The observational method is a method where researchers observe people in order to obtain information from them. Method reactivity biases happen when respondents change their behavior because they know they are being observed.

Direct observation happens when the observer is typically present and personally monitors what takes place. Indirect observations occur when respondents are being observed through mechanical, photographic or electronic means. With the latter, concealment is used by the observer to shield himself from their observation. An example is a one-way mirror.

The following are examples of non-behavioural observation:

  • Record analysis is a prevalent form of structured observation, which involves historical or current records, and public or private records.

  • Physical condition analysis is known for store audits of merchandise availability, studies of plant safety compliance, analysis of inventory conditions and analysis of financial statements

  • Process (activity) analysis includes time/motion studies of manufacturing processes.

The following are examples of behavioural observation:

  • Non-verbal behavior includes analyzing the body movements, motor expressions and the exchanged glances of the respondent.

  • Linguistic behavior is for example the amount of ‘’ah’s’’ sounds people make during for example a presentation, which can be perceived as very annoying.

  • Extra-linguistic behavior include for example the loudness of someone’s voice, the rate of speaking, interaction and for example dialect.

  • Spatial relationships show how people relate to others physically.

Observations at a factual level are direct descriptions of what is happening and what can be seen. Observations at an inferential level translate what is seen to a concept that cannot be observed.

 

There are two different types of data collection methods: the observation approach and the communication approach. The observation approach involves observing conditions, behaviour, events, people or processes. The communication approach involves surveying/interviewing people and recording their response for analysis. Communicating with people about various topics, including participants, attitudes, motivations, intentions and expectations.

 

 

How do you conduct a succesful experiment? - Chapter 12

 

Causal methods are research methods which answer questions such as “Why do events occur under some conditions and not under others?”

Ex post facto research designs, in which a researcher interviews respondents or observes what is or what has been, have the potential for discovering causality. In comparison, the distinction is that with causal methods the researcher is required to accept the world as it is found, whereas an experiment allows the researcher to systematically alter the variables of interest and observe what changes follow.

Experiments are studies which involve intervention by the researcher beyond what is required for measurement. Usually this means manipulating some variable in a setting and observing how it affects the subjects being studied (e.g.., physical entities or people). One manipulates the independent or explanatory variable and observes whether the hypothesized dependent variable is affected by this. In a causal relationship there is at least one independent variable (IV) and one dependent variable (DV). It is hypothesized that in some way the IV ‘causes’ the DV to occur.

The basis for the conclusion of the experiment is formed by three types of evidence:

  1. There must be an agreement between independent and dependent variables. In other words, the presence or absence of one can be linked to the presence or absence of the other.

  2. The time order of the occurrence of the variables has to be considered. The dependent variable should not go before the independent variable.

  3. The researchers ought to be confident that other extraneous variables did not influence the dependent variable. Researchers control their ability to confound the planned comparison, in order to ensure that other extraneous variables are not the source of influence. Standardized conditions for control can be arranged under laboratory conditions. Such controls are important, but further precautions are needed so that the results achieved reflect only the influence of the independent variable on the dependent variable.

What are the advantages of experiments?

Causality cannot be proved with certainty but the probability of one variable being related to another can be established credibly. An experiment comes closer than any primary data collection method to accomplishing this.

  • The primary advantage is the researcher’s ability to manipulate the independent variable. Consequently, the probability increases that changes in the dependent variable are a function of that manipulation. Also, a control group serves as a comparison to assess the existence and potency of the manipulation.

  • The second advantage is that influence of extraneous variables can be controlled more effectively than in other designs. This helps the researcher isolate experimental variables and evaluate their impact over time.

  • Thirdly, the convenience and cost are superior to other methods. This allows the experimenter opportunistic scheduling of data collection and the flexibility to adjust variables and conditions that evoke extremes not observed under routine circumstances. Also, the experimenter can assemble combinations of variables for testing rather than having to search for their unexpected appearance in the study environment.

  • Fourth, replication (= repeating an experiment with different subject groups and conditions) leads to the discovery of an average effect of the independent variable across people, situations and times.

  • Fifth, researchers can use naturally occurring events and, to some extent, field experiments (a study of the dependent variable in actual environmental conditions) to reduce subjects’ perceptions of the researcher as a source of intervention or deviation in their everyday lives.

What are the disadvantages of experiments?

  • It is argued that the primary disadvantage of the experimental method is the artificiality of the laboratory. However, many subjects’ perceptions of an unnatural environment can be improved by investment in the facility.

  • Second, despite random assignment, generalization from non-probability samples can pose problems. Additionally, when an experiment is not successfully disguised, volunteer subjects are often those with the most interest in the topic.

  • Despite the relatively low costs of experimentation, many applications of experimentation far outrun the budgets for other primary data collection methods.

  • Experimentation is most effectively targeted at problems of the present or immediate future, as the studies of the past are not feasible, and studies about intentions or predictions are difficult.

  • There are limits to the types of manipulation and controls that are ethical in the study of people.

How do you conduct an experiment?

Researchers, in a well-executed experiment, must complete a series of activities to carry out their craft successfully.

  • Researcher has to start with selecting relevant variables and specifying treatment levels.

  • Then the issue of control of the experimental environment has to be considered.

  • Subsequently an experimental design has to be chosen and subjects have to be selected and assigned.

  • The next step is pilot testing, revising and testing.

  • In the end the data collected has to be analysed.

Step 1 – selecting relevant variables: The researcher’s task is to translate a vague problem into the question or hypothesis that best states the objectives of the research. Depending on the complexity of the problem, investigative questions and additional hypotheses can be created to address specific facets of the study or data that need to be gathered.
Hypothesis is a relational statement as it describes a relationship between two or more variables. It must also be operationalized (how concepts are transformed into variables to make them measurable and subject to testing). Once the researcher has formulated the research question and hypothesis, he has to:

  • Select variables that are the best operational representations of the original concepts.

  • Determine how many variables to test.

  • Select or design appropriate measures for them.

The number of variables in an experiment is constrained by the project budget, the time allocated, the availability of appropriate controls, and the number of subjects being tested. There must be more subjects than variables – for statistical reasons.

The selection of measures for testing requires a thorough review of the available literature and instruments. Also, measures must be adapted to the unique needs of the research situation without compromising their intended purpose or original meaning.

Step 2 – specifying treatment levels: In an experiment, participants experience a manipulation of the independent variable, called the experimental treatment. The treatment levels of the independent variable are the arbitrary or natural groups the researcher makes within the independent variable of an experiment. The levels assigned to an independent variable should be based on simplicity and common sense.

A control group could provide a base level for comparison. The control group is composed of subjects who are not exposed to the independent variable(s), in contrast to those who receive the experimental treatment.

Step 3 – controlling the experimental environment: Extraneous variables can appear as differences in age, gender, race, dress, communication, etc. These have the potential for distorting the effect of the treatment on the dependent variable and must be controlled or eliminated. However, at this stage, a researcher is mainly concerned with environmental control, holding constant the physical environment of the experiment. For consistency, the introduction of the experiment to the subjects and the instructions would likely be videotaped. The arrangement of the room, the time of administration, the experimenter’s contact with the subjects, and so forth, must all be consistent across each administration of the experiment. Other forms of control involve subjects and experimenters. When subjects do not know if they are receiving the experimental treatment, they are said to be blind.

When the experimenters do not know if they are giving the treatment to the experimental group or to the control group, the experiment is said to be double blind. Both approaches control unwanted complications such as subjects’ reactions to expected conditions or experimenter influence.

Step 4 – choosing the experimental design: Experimental designs are unique to the experimental method. They serve as positional and statistical plans to designate relationships between experimental treatments and the experimenter’s observations and measurement points in the temporal scheme of the study. The researchers apply their knowledge to select one design that is best suited to the goals of the research. Judicious selection of the design improves the probability that the observed change in the dependent variable was caused by the manipulation of the independent variable and not by any other factor. It simultaneously strengthens the generalizability of results beyond the experimental setting.

Step 5 – selecting and assigning participants: The selected participants should be representative of the population to which the researcher wishes to generalize the results from the study. In principle, the procedure for random sampling of experimental subjects is similar to the selection of respondents for a survey. First the researcher prepares a sampling frame and then randomly assigns the subjects for the experiment to groups.

Systematic sampling may be used if the sampling frame is free from any form of periodicity that parallels the sampling ratio. Since the sampling frame is often small, experimental subjects are recruited; thus, they are a self-selecting sample.

However, if randomization is used, those assigned to the experimental group are likely to be similar to those assigned to the control group.

Random assignment to the groups is required to make the groups as comparable as possible with respect to the dependent variable. Randomization does not guarantee that if the groups were pretested they would be pronounced identical; but it is an assurance that those differences remaining are randomly distributed.

Matching may be used when it is not possible to randomly assign subjects to groups. This employs a non-probability quota sampling approach. The object of matching is to have each experimental and control subject matched on every characteristic used in the research. Since the characteristics of concern are only those that are correlated with the treatment condition or the dependent variable, they are easier to identify, control, and match.

Some authorities suggest a quota matrix as the most efficient means of visualising the matching process. E.g.. one-third of the subjects from each cell of the matrix would be assigned to each of the three groups (2 experimental + 1 control). If matching does not alleviate the assignment problem, a combination of matching, randomization, and increasing the sample size would be used.

Step 6 – pilot testing, revising and testing: The procedures for this stage are similar to those for other forms of primary data collection. Pilot testing is intended to reveal errors in the design and improper control of extraneous or environmental conditions. Pretesting the instruments permits refinement before the final test. This allows for revising scripts, determining control problems with laboratory conditions, and scanning of the environment for factors that might confound the results.

What does validity mean in experimentation?

There is always a question about whether the results are true. Validity has been defined as whether a measure accomplishes its claims. There are several different types of validity, but here only the two major varieties are considered: internal validity – do the conclusions drawn about a demonstrated experimental relationship truly imply cause? – and external validity – does an observed causal relationship generalize across persons, settings, and times? Each type of validity has specific threats a researcher should to guard against.

Internal validity

Following are some of the threats to the internal validity:

  • History – in an experiment some events may occur that confuse the relationship being studied. In many experimental designs, a researcher takes a control measurement (O1) of the dependent variable before introducing the manipulation (X). After the manipulation, a researcher takes an after-measurement (O2) of the dependent variable. The difference between O1 and O2 is the change that the manipulation has caused.

  • Maturation – changes may also occur within the subject that are a function of the passage of time and are not specific to any particular event. These are of special concern when the study covers a long time, but they may also be factors in tests that are as short as an hour or two. A subject can become hungry, bored, or tired in a short time, which can affect response results.

  • Testing – the process of taking a test can affect the scores of a second test. Taking the first test can have a learning effect that influences the results of the second test.

  • Instrumentation – this threat to internal validity results from changes between observations in either the measuring instrument or the observer. Using different questions at each measurement is an obvious source of potential trouble, but using different observers or interviewers also threatens validity. There can even be an instrumentation problem if the same observer is used for all measurements - experience, boredom, fatigue, and anticipation of results can all distort the results of separate observations.

  • Selection – an important threat to internal validity is the differential selection of subjects for experimental and control groups. Validity considerations require that the groups be equivalent in every respect. If subjects are randomly assigned to experimental and control groups, this selection problem can be largely overcome. Additionally, matching the members of the groups on key factors can enhance the equivalence of the groups.

  • Statistical regression – this factor operates especially when groups have been selected by their extreme scores. No matter what is done between O1 and O2, there is a strong tendency for the average of the high scores at O1 to decline at O2 and for the low scores at O1 to increase. This tendency results from imperfect measurement that, in effect, records some persons abnormally high and abnormally low at O1. In the second measurement, members of both groups score more closely to their long-run mean scores.

  • Experimental mortality – this occurs when the composition of the study groups changes during the test. Attrition is especially likely in the experimental group and with each dropout the group changes. Because members of the control group are not affected by the testing situation, they are less likely to withdraw. In general, the threats mentioned are dealt with adequately in experiments by random assignment. However, five additional threats to internal validity are independent of whether or not one randomizes. The first three have the effect of equalizing experimental and control groups.

  • Diffusion or imitation of treatment – if the control group learns of the treatment (by talking to people in the experimental group) it eliminates the difference between the groups.

  • Compensatory equalization – where the experimental treatment is much more desirable, there may be an administrative reluctance to withdraw the control group members. Compensatory actions for the control groups may confound the experiment.

  • Compensatory rivalry – this may occur when members of the control group know they are in the control group. This may generate competitive pressures.

  • Resentful demoralization of the disadvantaged – when the treatment is desirable and the experiment is obtrusive, control group members may become resentful of their deprivation and lower their cooperation and output.

  • Local history – the regular history effect already mentioned impacts both experimental and control groups alike. However, when one assigns all experimental persons to one group session and all control people to another, there is a chance for some peculiar event to confound results. This can be handled by administering treatments to individuals or small groups that are randomly assigned to experimental or control sessions.

External validity

Internal validity factors cause confusion about whether the experimental treatment (X) or extraneous factors are the source of observation differences. External validity is concerned with the interaction of the experimental treatment with other factors and the resulting impact on the ability to generalize to (and across) times, settings, or persons. The following interactive possibilities are among the major threats to external validity:

  • Reactivity of testing on X – the reactive effect refers to sensitising subjects via a pre-test so that they respond to the experimental stimulus (X) in a different way. This before-measurement effect can be particularly significant in experiments where the IV is a change in attitude.

  • Interaction of selection and X – the process by which test subjects are selected for an experiment may be a threat to external validity. The population from which one selects subjects may not be the same as the population to which one wishes to generalize results.

  • Other reactive factors – experimental settings may have a biasing effect on a subject’s response to X. An artificial setting can obviously produce results that are not representative of larger populations. If subjects know they are participating in an experiment, there may be a tendency to role-play in a way that distorts the effects of X. Another reactive effect is the possible interaction between X and subject characteristics.

Problems of internal validity can be solved by the careful design of experiments, but this is less true for problems of external validity.

What does an experimental research designs look like?

The many experimental designs differ greatly in their power to control contamination of the relationship between independent and dependent variables. The most widely accepted designs are based on this characteristic of control: (1) pre-experiments, (2) true experiments, and (3) field experiments.

Pre-experimental designs

All three fail to adequately control the various threats to internal validity.

  • After-only study: First, treatment or manipulation of independent variable is conducted and then observation or measurement of dependent variable takes place. The lack of a pre-test and control group makes this design inadequate for establishing causality.

  • One group Pre-test-Post-test Design: In this case there is a pre-test, which takes place before the manipulation, and the post-test. Still a weak design – how well does it control for history? Maturation? Testing effect? The others?

  • Static Group Comparison – the design provides for two groups, one of which receives the experimental stimulus while the other serves as a control.

The addition of a comparison group creates a substantial improvement over the other two designs. Its chief weakness is that there is no way to be certain that the two groups are equivalent.

True experimental designs

The major deficiency of the pre-experimental designs is that they fail to provide comparison groups that are truly equivalent. The way to achieve equivalence is through matching and random assignment. With randomly assigned groups, tests of statistical significance of the observed differences can be employed. It is common to show an X for the test stimulus and a blank for the existence of as control situation. This is an oversimplification of what really occurs. More precisely, there is an X1 and an X2 - sometimes more. X1 identifies one specific independent variable, while X2 is another independent variable that has been chosen, often randomly, as the control case. Different levels of the same independent variable may also be used, with one level serving as the control.

Pre-test-Post-test Control Group Design

This design consists of adding a control group to the one-group pre-test-post-test design and assigning the subjects to either of the groups by a random procedure (R). The seven major internal validity problems are dealt with fairly well in this design, but there are still some difficulties. Local history may occur in one of the groups, and not in the other. Also, if communication exists between people in test and control groups, there can be competition and other internal validity problems.

Maturation, testing, and regression are handled well because one would expect them to be felt equally in experimental and control groups. Mortality can be a problem if there are different dropout rates in the study groups. Selection is adequately dealt with by random assignment. The record of this design is not as good on external validity - there is a chance for a reactive effect from testing. This might be a substantial influence in attitude change studies where pre-tests introduce unusual topics and content. This design also doesn’t ensure against reaction between selection and the experimental variable.

Even random selection may be defeated by a high decline rate by subjects, resulting in using a disproportionate share of people who are essentially volunteers and who may not be typical of the population.

If this occurs, the experiment will need to be replicated several times with other groups under other conditions, before a researcher can be confident of external validity.

The experimental approach can also be combined with the survey approach. An example an implementation of experimental designs into surveys is the factorial survey, which is also known as vignette research.

Post-test-Only Control Group Design

The pre-test measurements are omitted in this design. Pre-tests are well established in classical research design but are not really necessary when it is possible to randomize.

The simplicity of this design makes it more attractive than the pre-test-post-test control group design. Internal validity threats from history, maturation, selection, and statistical regression are adequately controlled by random assignment. Because participants are measured only once, the threats of testing and instrumentation are reduced, but different mortality rates between experimental and control groups continue to be a potential problem. The external validity problem of testing interaction effect is reduced.

Field experiments: quasi- or semi-experiments?

Under field conditions, a researcher often cannot control enough of the extraneous variables or the experimental treatment to use a true experimental design. If the stimulus condition occurs in a natural environment, a field experiment is required. Some studies are not possible with a control group, a pre-test, or randomization of customers. Pre-experimental designs or quasi-experiments are used to deal with such conditions.

It is often unknown when or to whom to expose the experimental treatment in a quasi-experiment. Usually though, it can decided when and whom to measure. A quasi-experiment is inferior to a true experimental design but is usually superior to pre-experimental designs. There are two groups of this type that can be distinguished:

  1. Non-equivalent group-design.

  2. Time-series design.

Non-equivalent Control Group Design

This is a strong and widely used quasi-experimental design. It differs from the pre-test-post-test control group design - the test and control groups are not randomly assigned.

There are two varieties.

  1. Intact equivalent design, in which the membership of the experimental and control groups is naturally assembled. Ideally, the two groups are as alike as possible. This design is especially useful when any type of individual selection process would be reactive.

  2. The self-selected experimental group design is weaker because volunteers are recruited to form the experimental group, while no volunteer subjects are used for control. This design is likely when subjects believe it would be in their interest to be a subject in an experiment.

Separate Sample Pre-test-Post-test Design

Most applicable when it is unknown when and to who to introduce the treatment but it can decide when and whom to measure. This is a weaker design because several threats to internal validity are not handled adequately.

History can confound the results but can be overcome by repeating the study at other times in other settings. It is considered superior to true experiments in external validity. Its strength results from its being a field experiment in which the samples are usually drawn from the population to which a researcher wishes to generalize our findings. The design is more appropriate where the population is large, if a before-measurement was reactive, or if there was no way to restrict the application of the treatment.

Time series and comparison groups

Introduces repeated observations before and after the treatment, and allows subjects to act as their own controls. It is a good way to do research on unplanned events in an ex-post-facto manner.

What is the most important information from this chapter?

Causal method is a type of research method that investigates whether events occur under some condition and not under others. Experiments are studies that involve intervention by the investigator beyond that required for measurement.

A hypothesis is a statement used in empirical testing that explains the predictions of a relationship between two or more variables. With empirical testing, there is always at least one independent variable, which is the variable that is manipulated by the investigator, which causes a change on the dependent variable. There also has to be a dependent variable, which is the variable that is being measured by the investigator. It is expected to be affected by a manipulation of the independent variable.

Replication is repeating an experiment with different subject groups and conditions. Field experiments are studies that occur under the environmental conditions where the DV occurs and is measured.

When relevant variables are selected, they have to be defined in a conceptual and operational way, meaning that the concepts are transformed in to variables to make them measurable and ready for testing.

Environmental control entails holding constant the physical environment of the experiment. When a subject is blind, they don’t know if they are receiving the experimental treatment. The experiment is double blind when the experimenters themselves do not know if they are giving the treatment to the experimental group of the control group.

Matching employs a non-probability quota sampling approach.

There are seven threats to internal validity:

  1. History.

  2. Maturation.

  3. Testing.

  4. Instrumentation.

  5. Selection.

  6. Statistical regression.

  7. Experimental mortality.

Further threats: diffusion or imitation of treatment, compensatory equalization, compensatory rivalry, resentful demoralization of the disadvantaged.

The cornerstones of true experimental designs are:

  • They consist of an experimental group and a control group.

  • The investigator makes sure that the experimental and control group are either equal through arbitrary assigning subjects to both groups through matching.

There are a couple of experimental designs:

  1. Pre-and post-test design (within subject design)

  2. Post-test-only control group design (between subject design

  3. Pre-test-post-test control group design, which also includes factorial surveys, which are also referred to as vignette research. In this type of survey, the investigator presents the respondent with a brief and explicit description of a situation and then asks him/her to evaluate the situation and make a decision.

Field experiments are conducted in a natural setting and, often, respondents do not know that their behaviour is being monitored. In a quasi-experiment, it is very often unknown when or to whom to expose the experimental design, making it inferior to a true experimental design. There are two types of quasi-experiments:

  1. Non-equivalent control group designs.

  2. Time-series design.

 

Causal methods are research methods which answer questions such as “Why do events occur under some conditions and not under others?”

 

Ex post facto research designs, in which a researcher interviews respondents or observes what is or what has been, have the potential for discovering causality. In comparison, the distinction is that with causal methods the researcher is required to accept the world as it is found, whereas an experiment allows the researcher to systematically alter the variables of interest and observe what changes follow.

Experiments are studies which involve intervention by the researcher beyond what is required for measurement. Usually this means manipulating some variable in a setting and observing how it affects the subjects being studied (e.g.., physical entities or people). One manipulates the independent or explanatory variable and observes whether the hypothesized dependent variable is affected by this. In a causal relationship there is at least one independent variable (IV) and one dependent variable (DV). It is hypothesized that in some way the IV ‘causes’ the DV to occur.

 

How are instrument designs developed? - Chapter 13

 

There are 3 suggested phases of developing an instrument design.

  1. Revisiting the research question hierarchy.

  2. Constructing and refining the measurement questions.

  3. Drafting and refining instruments.

What is important in Phase 1?

Revisiting the research question hierarchy

In general, once the researcher understands the connection between the investigative questions and the potential measurement questions, a strategy for the survey is the next step. This proceeds to getting down to the particulars of instrument design. The following are important issues to be considered:

  • Type of scale for desired analysis – the analytical procedures available to the researcher are determined by the scale types used in the survey. It is important to plan the analysis before developing the measurement questions.

  • Communication approach – Communication-based research may be conducted by personal interview, telephone, mail, computer, or some combination of these (called hybrid studies). The different delivery mechanisms result in different introductions, instructions, instrument layout, and conclusions.

  • Disguising objectives and sponsors – it has to be decided whether the purpose of the study should be disguised. A disguised question is designed to conceal the question’s true purpose. The decision about when to use disguised questions within surveys may be made easier by identifying four situations where disguising the study objective is or is not an issue:

    1. Willingly shared, conscious-level information – in surveys requesting conscious-level information that should be willingly shared, either disguised or undisguised questions may be used, but the situation rarely requires disguised techniques.

    2. Reluctantly shared, conscious-level information – sometimes the participant knows the information which a researcher needs but is reluctant to share it for a variety of reasons. When the participant is asked for an opinion on some topic on which he may hold a socially unacceptable view, projective techniques are used. In this type of disguised question, the survey designer phrases the questions in a hypothetical way or asks how other people in the participant’s experience would answer the question. The assumption is that responses to these questions will indirectly reveal the participant’s opinions.

    3. Knowable, limited-conscious-level information – not all information is at the participant’s conscious level. Given some time – and motivation – the participant can express this information. Asking about individual attitudes when participants know they hold the attitude but have not explored why they hold the attitude may encourage the use of disguised questions.

    4. Subconscious-level information – in assessing buying behaviour, it is accepted that some motivations are subconscious. This is true for attitudinal information as well. Seeking insight into the basic motivations underlying attitudes or consumption practices may or may not require disguised techniques.

  • Preliminary analysis plan – researchers are concerned with adequate coverage of the topic and with securing the information in its most usable form. A good way to test how well the study plan meets those needs is to develop ‘dummy’ tables that display the data one expects to secure. Each dummy table is a cross-tabulation between two or more variables. The preliminary analysis plan serves as a check on whether the planned measurement questions meet the data needs of the research question. This also helps the researcher determine the type of scale needed for each question – a preliminary step to developing measurement questions for investigative questions.

What happens in Phase 2?

Constructing and refining the measurement questions

Drafting or selecting questions begins once a complete list of investigative questions is developed and a decision is made on the collection processes to be used.

The order, type, and wording of the measurement questions, the introduction, the instructions, the transitions, and the closure in a quality questionnaire should accomplish the following:

  • Encourage each participant to provide an adequate amount of information.

  • Encourage each participant to provide accurate responses.

  • Discourage each participant from early discontinuation of participation.

  • Discourage each participant from refusing to answer specific questions.

  • Leave the participant with a positive attitude about survey participation.

Question categories and structure

Questionnaires and interview schedules (an alternative term for the questionnaires used in personal interviews) can range from those that have a great deal of structure to those that are essentially unstructured. Questionnaires contain three categories of measurement questions:

  • Administrative questions – identify the participant, interviewer, interview location, and conditions. These questions are rarely asked of the participant but are necessary for studying patterns within the data and identify possible error sources.

  • Classification questions – usually cover sociological-demographic variables that allow participants’ answers to be grouped so that patterns are revealed and can be studied. These questions usually appear at the end of a survey (except for those used as filters or screens, questions that determine whether a participant has the requisite level of knowledge to participate.

  • Target questions (structured or unstructured) – address the investigative questions of a specific study. These are grouped by topic in the survey. Target questions may be structured (they present the participants with a fixed set of choices, often called closed questions) or unstructured (the do not limit responses but do provide a frame of reference for participants’ answers; sometimes referred to as open-ended questions).

Question content

Is first and foremost dictated by the investigative questions guiding the study. From these questions, questionnaire designers craft or borrow the target and classification questions that will be asked of participants.

Four questions, covering numerous issues, guide the instrument designer in selecting appropriate question content:

  • Should this question be asked (does it match the study objective)?

  • Is the question of proper scope and coverage?

  • Can the participant adequately answer this question as asked?

  • Will the participant willingly answer this question as asked?

Question wording

A dilemma arises from the requirements of question design (the need to be explicit, to present alternatives, and to explain meanings). All contribute to longer and more involved sentences. The difficulties caused by question wording exceed most other sources of distortion in surveys. The diligent question designer will put a survey question through many revisions. Leading questions can inject significant error by implying that one response should be favoured over another.

Response strategy

A third major area in question design is the degree and form of structure imposed on the participant.

The various response strategies offer options that include unstructured response (or open-ended response, the free choice of words) and structured response (or closed response, specified alternatives provided).

Free-response questions - also known as open-ended questions, ask the participant a question and either the interviewer pauses for the answer (which is unaided) or the participant records his or her ideas in his or her own words in the space provided on a questionnaire.

Dichotomous question - suggest opposing responses (yes/no) and generate nominal data.

Multiple-choice questions - are appropriate when there are more than two alternatives or when a researcher seeks for gradations of preference, interest, or agreement the question. Multiple-choice questions usually generate nominal data. When the choices are numeric alternatives, this response structure may produce at least interval and sometimes ratio data. When the choices represent ordered but unequal, numerical ranges or a verbal rating scale, the multiple-choice question generates ordinal data.

Checklist – when multiple responses to a single question are required, the question should be asked in one of three ways: the checklist, rating, or ranking strategy. If relative order is not important, the checklist is logical choice. They are more efficient than asking for the same information with a series of dichotomous selection questions, one for each individual factor. Checklists generate nominal data.

Rating questions - ask the participant to position each factor on a companion scale, either verbal, numeric, or graphic. Generally, rating-scale structures generate ordinal data; some carefully crafted scales generate interval data. It is important to remember that the researcher should represent only one response dimension in rating-scale response options. Otherwise, the participant is presented with a double-barreled question with insufficient choices to reply to both aspects.

Ranking questions - ideal when relative order of the alternatives is important. The checklist strategy would provide the three factors of influence, but there is no way of knowing the importance the participant places on each factor. Ranking generates ordinal data.

What steps are taken in Phase 3?

Drafting and refining instruments

Phase 3 of instrument design – drafting and refinement – is a multistep process:

  • Participant screening and introduction – the introduction must supply the sample unit with the motivation to participate in the study. It must reveal enough about the forthcoming questions, usually by revealing some or all of the topics to be covered, for participants to judge their interest level and their ability to provide the desired information. In any communication study, the introduction also reveals the amount of time participation is likely to take. The introduction also reveals the researcher organization or sponsor (unless the study is disguised) and possibly the objective of the study. In personal or phone interviews the introduction usually contains one or more screen questions or filter questions to determine if the potential participant has the knowledge or experience necessary to participate in the study.

  • Measurement question sequencing - the design of survey questions is influenced by the need to relate each question to the others in the instrument. Often the content of one question (called a branch question) assumes other questions have been asked and answered. The basic principle used to guide sequence decisions is this: the nature and needs of the participant must determine the sequence of questions and the organization of the interview schedule. Four guidelines are suggested to implement this principle:

    1. The question process must quickly awaken interest and motivate the participant to participate in the interview. Put the more interesting topical target questions early. Leave classification questions not used as filters or screens to the end of the survey.

    2. The participant should not be confronted by early requests for information that might be considered personal or ego-threatening. Put questions that might influence the participant to discontinue or terminate the questioning process near the end. Use buffer questions – neutral questions designed chiefly to establish rapport with the participant.

    3. The questioning process should begin with simple items and then move to the more complex, as well as move from general items to the more specific. Put taxing and challenging questions later in the questioning process. The procedure of moving from general to more specific questions is sometimes called the funnel approach. The objectives of this procedure are to learn the participant’s frame of reference and to extract the full range of desired information while limiting the distortion effect of earlier questions on later ones.

    4. Changes in the frame of reference should be small and should be clearly pointed out. Use transition statements between different topics of the target question set.

  • Instructions - to the interviewer or participant attempt to ensure that all participants are treated equally, thus avoiding building error into the results. Two principles form the foundation for good instructions: clarity and courtesy. Instruction topics include those for:

    1. Terminating an unqualified participant – defining for the interviewer how to terminate an interview when the participant does not correctly answer the screen or filter questions.

    2. Terminating a discontinued interview – defining for the interviewer how to conclude an interview when the participant decides to discontinue.

    3. Moving between questions on an instrument – defining for an interviewer or participant how to move between questions or topic sections of an instrument (skip directions) when movement is dependent on the specific answer to a question or when branched questions are used.

    4. Disposing of a completed questionnaire – defining for an interviewer or participant completing a self-administered instrument how to submit the completed questionnaire.

  • Conclusion - its role is to leave the participant with the impression that his or her involvement has been valuable. Subsequent researchers may need this individual to participate in new studies.

Overcoming instrument problems

There is no substitute for a thorough understanding of question wording, question content, and question sequencing issues. However, the researcher can do several things to help improve survey results, among them:

  • Build rapport with the participant – most information can be secured by direct undisguised questioning if rapport has been developed. Rapport is particularly useful in building participant interest in the project, and the more interest participants have, the more cooperation they will give.

  • Redesign the questioning process – to improve the quality of answers by modifying the administrative process and the response strategy.

  • Explore alternative response strategies – when drafting the original question, try developing positive, negative, and neutral versions of each type of question. This practice helps to select question wording that minimizes bias.

  • Use methods other than surveying to secure the data.

  • Pre-test all the survey elements – assessment of questions/ instruments before the start of a study.

There are abundant reasons for pretesting individual questions, questionnaires, and interview schedules:

  • Discovering ways to increase participant interest.

  • Increasing the likelihood that participants will remain engaged to the completion of the survey.

  • Discovering question content, wording, and seqUencing problems.

  • Discovering target question groups where researcher training is needed.

  • Exploring ways to improve the overall quality of survey data.

 

There are 3 suggested phases of developing an instrument design.

 

  1. Revisiting the research question hierarchy.

  2. Constructing and refining the measurement questions.

  3. Drafting and refining instruments.

 

What is the function of measurement scales and how are they used? - Chapter 14

 

To measure: to discover the extent, dimensions, quantity, or capacity of something, especially by comparison with a standard.

Measurement in research consists of assigning numbers to empirical events, objects or properties, or activities in compliance with a set of rules.

This definition implies that measurement is a three-part process:

  1. Selecting observable empirical events.

  2. Developing a set of mapping rules: a scheme for assigning numbers or symbols to represent aspects of the event being measured.

  3. Applying the mapping rule(s) to each observation of that event.

Variables being studied in research may be classified as objects or as properties.

Objects include the concepts of ordinary experience, such as touchable items like furniture. Objects also include things that are not as concrete, i.e.. genes, attitudes and peer-group pressures.

Properties are the characteristics of the object. A person’s physical properties may be stated in terms of weight or height.

  • Psychological properties: include attitudes and intelligence.

  • Social properties include leadership ability, class affiliation, and status. In a literal sense, researchers do not measure either objects or properties.

They measure indicants of the properties or indicants of the properties of objects. Since each property cannot be measured directly, one must infer its presence or absence by observing some indicant or pointer measurement.

Measurement scales

In measuring, one devises some mapping rule and then translates the observation of property indicants using this rule. Several types of measurement are possible; the appropriate choice depends on what is assumed about the mapping rules. Each one has its own set of underlying assumptions about how the numerical symbols correspond to real-world observations.

Mapping rules have four assumptions:

  1. Numbers are used to classify, group, or sort responses. No order exists.

  2. Numbers are ordered. One number is greater than, less than, or equal to another number.

  3. Differences between numbers are ordered. The difference between any pair of numbers is greater than, less than, or equal to the difference between any other pair of numbers.

  4. The number series has a unique origin indicated by the number zero. This is an absolute and meaningful zero point.

Combinations of these characteristics of classification, order, distance, and origin provide four widely used classifications of measurement scales:

Nominal scales

With these scales, a researcher is collecting information on a variable that naturally (or by design) can be grouped into two or more categories that are mutually exclusive and collectively exhaustive. The only possible arithmetic operation when a nominal scale is employed is the counting of members. Nominal classifications can consist of any number of separate groups if the groups are mutually exclusive and collectively exhaustive. These scales are the least powerful of the four data types. They suggest no order or distance relationship and have no arithmetic origin. Any information a sample element might share about varying degrees of the property being measured, is wasted by this scale. The only qualification is the number count of cases in each category (the frequency distribution), so the researcher is restricted to the use of the mode as the measure of central tendency. It can only be concluded which category has the most members. There is no generally used measure of dispersion for nominal scales.

Dispersion: describes how scores cluster or scatter in a distribution. Nominal data are statistically weak, but they can still be useful. One can almost always classify a set of properties into a set of equivalent classes. Nominal measures are especially valuable in exploratory work where the objective is to uncover relationships rather than secure precise measurements. Nominal scales are also widely used in surveys and other research when data are classified by major subgroups of the population.

Classifications such as respondents’ marital status, gender, political orientation, and exposure to a certain experience provide insight into important demographic data patterns.

Ordinal scales

Include the characteristics of the nominal scale plus an indicator of order. Ordinal data require conformity to a logical postulate: If a > b and b > c, then a > c. The use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (or equal) without stating how much greater or less. Other descriptions can be used – ‘superior to’, ‘happier than’ etc. An ordinal concept can be extended beyond the three cases used in the simple illustration of a>b>c – any number of cases can be ranked.

Another extension of the ordinal concept occurs when there is more than one property of interest. Examples of ordinal data include attitude and preference scales. Because the numbers used with ordinal scales have only a rank meaning, the appropriate measure of central tendency is the median. The median is the midpoint of a distribution. A percentile or quartile reveals the dispersion. Co-relational analysis of ordinal data is restricted to various ordinal techniques. Measures of statistical significance are technically confined to a body of statistics known as nonparametric methods, synonymous with distribution-free statistics.

Interval scales

Have the power of nominal and ordinal data plus one additional strength: they incorporate the concept of equality of interval (the scaled distance between 1 and 2 equals the distance between 2 and 3). Calendar time is such a scale. Centigrade and Fahrenheit temperature scales are other examples of classical interval scales. Both have an arbitrarily determined zero point, not a unique origin. Researchers treat many attitude scales as interval.

When a scale is interval and the data are relatively symmetric with one mode, you use the arithmetic mean as the measure of central tendency. When the distribution of scores computed from interval data leans in one direction or the other (skewed right or left), we often use the median as the measure of central tendency and the interquartile range as the measure of dispersion.

Ratio scales

Incorporate all of the powers of the previous scales plus the provision for absolute zero or origin. Ratio data represent the actual amounts of a variable. Measures of physical dimensions such as weight, height, and distance are examples. In business research, we ratio scales in many areas – there are money values, population counts, return rates, and productivity rates. For statistical purposes the analyst would use the same statistical techniques as with interval data. All statistical techniques mentioned up to this point are usable with ratio scales. Other manipulations carried out with real numbers may be done with ratio-scale values. Thus, multiplication and division can be used with this scale but not with the others mentioned. Geometric and harmonic means are measures of central tendency, and coefficients of variation may also be calculated for describing variability. Higher levels of measurement generally yield more information.

Because of the measurement precision at higher levels, more powerful and sensitive statistical procedures can be used. When we collect information at higher levels, we can always covert, rescale, or reduce the data to arrive at a lower level.

What sources of measurement differences are there?

Since compete control (of study) is unattainable, error does occur. Much error is systematic (results from bias), while the remainder is random (occurs erratically). There are four major error sources which may contaminate the results:

The respondent

Opinion differences that affect measurement come from relatively stable characteristics of the respondent. Typical of these are employee status, ethnic group membership, social class, etc.

The skilled researcher will anticipate many of these dimensions, adjusting the design to eliminate, neutralize, or otherwise deal with them. Respondents may be reluctant to express strong negative or positive feelings, may purposefully express attitudes that they perceive as different from those of others, or may have little knowledge about something but be reluctant to admit ignorance. This reluctance to admit ignorance of a topic can lead to an interview consisting of ‘guesses’ or assumptions, which, in turn, create erroneous data. Respondents may also suffer from temporary factors like fatigue, boredom, anxiety, hunger, etc.; these limit the ability to respond accurately and fully.

Situational factors

Any condition that places a strain on the interview or measurement session can have serious effects on the interviewer-respondent rapport. If another person is present, that person can distort responses by joining in, by distracting, or by merely being there. If the respondents believe anonymity is not ensured, they may be reluctant to express certain feelings.

What functions does the measurer have?

The interviewer can distort responses by rewording, paraphrasing, or reordering questions. Stereotypes in appearance and action introduce bias. Inflections of voice and conscious or unconscious prompting with smiles, nods, and so forth, may encourage or discourage certain replies. Checking of the wrong response or failure to record full replies will obviously distort findings. In the data analysis stage, incorrect coding, careless tabulation, and faulty statistical calculation may introduce further errors.

The instrument

A defective instrument can cause distortion in two major ways. First, it can be too confusing and ambiguous. The use of complex words and syntax beyond participant comprehension is typical. Leading questions, ambiguous meanings, mechanical defects (inadequate space for replies, response-choice omissions, and poor printing), and multiple questions suggest the range of problems. Many of these problems are the direct result of operational definitions that are insufficient, resulting in an inappropriate scale being chosen or developed. A more elusive type of instrument deficiency is poor selection from the universe of content items. Seldom does the instrument explore all the potentially important issues. Even if the general issues are studied, the questions may not cover enough aspects of each area of concern.

What are the characteristics of good measurement?

The tool should be an accurate counter or indicator of what we are interested in measuring. In addition, it should be easy and efficient to use. There are three major criteria for evaluating a measurement tool:

Validity

Is the extent to which a test measures what we actually wish to measure. This text features two major forms: external and internal validity. The external validity of research findings is the data’s ability to be generalized across persons, settings, and times. Internal validity is further limited in this discussion to the ability of a research instrument to measure what it is purported to measure. One widely accepted classification of validity consists of three major forms:

  1. Content Validity – of a measuring instrument is the extent to which it provides adequate coverage of the investigative questions guiding the study. If the instrument contains a representative sample of the universe of subject matter of interest, then content validity is good. To evaluate the content validity of an instrument, one must first agree on what elements constitute adequate coverage. A determination of content validity involves judgment.

  2. Criterion-Related Validity – reflects the success of measures used for prediction or estimation. You may want to predict an outcome or estimate the existence of a current behaviour or time perspective. An attitude scale that correctly forecasts the outcome of a purchase decision has predictive validity. An observational method that correctly categorizes families by current income class has concurrent validity. Any criterion measure must be judged in terms of four qualities: relevance, freedom from bias, reliability, and availability. A criterion is relevant if it is defined and scored in the terms we judge to be the proper measures of someone’s success. An erratic criterion can hardly be considered a reliable standard by which to judge performance on a sales employment test. Finally, the information specified by the criterion must be available.

If this is not available, how much will it cost and how difficult will it be to secure? The amount of money and effort that should be spent on development of a criterion depends on the importance of the problem for which the test is used. Once there are test and criterion scores, they must be compared in some way.

  1. Construct validity – in attempt to evaluate, we consider both the theory and the measuring instrument being used. If we were interested in measuring the effect of trust in cross functional teams, the way in which ‘trust’ was operationally defined would have to correspond to an empirically grounded theory. If a known measure of trust was available, we might correlate the results obtained using this measure with those derived from our new instrument.

Such an approach would provide us with preliminary indications of convergent validity (the degree to which scores on one scale correlate with scores on other scales designed to assess the same construct). Another method to of validating the trust construct would be to separate it from other constructs in the theory or related theories. To the extent that trust could be separated from bonding, reciprocity, and empathy, we would have completed the first steps towards discriminant validity (the degree to which scores on a scale do not correlate with scores from scales designed to measure different constructs).

Reliability

Has to do with the accuracy and precision of a measurement procedure. A measure is reliable to the degree that it supplies consistent results. Reliability is a necessary contributor to validity but is not a sufficient condition for validity. If a measurement is not valid, it hardly matters if it is reliable – because it does not measure what the designer needs to measure in order to solve the research problem. In this context, reliability is not as valuable as validity, but it is much easier to assess.

Reliability is concerned with estimates of the degree to which a measurement is free of random or unstable error. Reliable instruments can be used with confidence that transient and situational factors are not interfering. Reliable instruments are robust; they work well at different times under different conditions.

This distinction of time and condition is the basis of frequently used perspectives on reliability:

Stability – a measure is said to possess stability if consistent results with repeated measurements of the same person with the same instrument can be secured. An observation procedure is stable if it gives the same reading on a particular person when repeated one or more times.

Some of the difficulties that can occur in the test-retest methodology and cause a downward bias in stability include:

  • Time delay between measurements – leads to situational factor changes.

  • Insufficient time between measurements – permits the respondent to remember precious answers and repeat them, resulting in biased reliability indicators.

  • Respondent’s discernment of a study’s disguised purpose – nay introduce bias if the respondent holds opinions related to the purpose but not assessed with current measuring questions.

  • Topic sensitivity – occurs when the respondent seeks to learn more about the topic or form new and different opinions before the retest.

A suggested remedy is to extend the interval between test and retest (from two weeks to a month).

Equivalence – a second perspective on reliability considers how much error may be introduced by different investigators (in observation) or different samples of items being studied (in questioning or scales). Thus, while stability is concerned with personal and situational fluctuations from one time to another, equivalence is concerned with variations at one point in time among observers and samples of times. A good way to test for the equivalence of measurements by different observers is to compare their scoring of the same event. In studies where a consensus among experts or observers is required, the similarity of the judges’ perceptions is sometimes questioned. One tests for item sample equivalence by using alternative or parallel forms of the same test administered to the same persons simultaneously. The results of the two tests are then correlated. Under this condition, the length of the testing process is likely to affect the subjects’ responses through fatigue, and the inferred reliability of the parallel form will be reduced accordingly. Some measurement theorist recommended an interval between the two tests to compensate for this problem. This approach, called delayed equivalent forms, is a composite of test-retest and the equivalence method. As in test-retest, one would administer form X followed by form Y to half of the examinees and form Y followed by form X to the other half to prevent ‘order of presentation’ effects.

Internal Consistency – a third approach to reliability uses only one administration of an instrument or test to assess the internal consistency or homogeneity among the items.
The split-half technique can be used when the measuring tool has many similar questions or statements to which participant can respond. The instrument is administered and the results are separated by item into even and odd numbers or into randomly selected halves.

When the two halves are correlated, if the results of the correlation are high, the instrument is said to have high reliability in an internal consistency sense. The high correlation tells us there is similarity (or homogeneity) among the items. The potential for incorrect inferences about high internal consistency exists when the test contains many items – which inflate the correlation index.

Practicality

Is concerned with a wide range of factors of economy, convenience, and interpretability. The scientific requirements of a project call for the measurement process to be reliable and valid, while the operational requirements call for it to be practical.

  • Economy – some trade-off usually occurs between the ideal research project and the budget. Data are not free, and instrument length is one area where economic pressures dominate. The choice of data collection method is also often dictated by economic factors.

  • Convenience – a measuring device passes the convenience test if it is easy to administer. A questionnaire or a measurement scale with a set of detailed but clear instructions, with examples, is easier to complete correctly than one that lacks these features. We can also make the instrument easier to administer by giving close attention to its design and layout.

  • Interpretability – this aspect of practicality is relevant when persons other than the test designers must interpret the results. In such cases, the designer of the data collection instrument provides several key pieces of information to make interpretation possible

Scaling is the ‘procedure for the assignment of numbers (or other symbols) to a property of objects in order to impart some of the characteristics of numbers to the properties in question.’ Procedurally, numbers are assigned to indicants of the properties of objects. Thus, one assigns a number scale to the various levels of heat and cold and calls it a thermometer.

How do you select a measurement scale?

Selecting and constructing a measurement scale requires the consideration of several factors that influence the reliability, validity, and practicality of the scale:

Research objectives

Researchers face two general types of scaling objectives:

  1. To measure characteristics of the participants who participate in the study.

  2. To use participants as judges of the objects or indicants presented to them.

With the first study objective, the scale would measure the customers’ orientation as favourable or unfavourable. With the second objective, the same data may be used, but the focus is on how satisfied people are with different design options.

Response types

Measurement scales fall into one of four general types: rating, ranking, categorization, and sorting. A rating scale is used when participants score an object or indicant without making a direct comparison to another object or attitude. Ranking scales constrain the study participant to making comparisons and determining order among two or more properties (or their indicants) or objects. A choice scale requires that participants choose one alternative over another. Categorization asks participants to put themselves or property indicants in groups or categories. Sorting requires that participants sort cards (representing concepts or constructs) into piles using criteria established by the researcher. The cards might contain photos or images or verbal statements of product features.

Degree of preference

Scales might happen to involve preference measurement or non-preference evaluation. In the first one, every respondent is asked to pick the object that he or she favours or the solution he or she would prefer. In the second one, respondents are asked to judge which of the object has more of some characteristic of which solution takes most resources.

Data properties

Decisions about the choice of measurement scales are often made with regard to the data properties generated by each scale. Scales are classified in increasing order of power; scales are nominal, ordinal, interval, or ratio. Nominal scales classify data into categories without indicating order, distance, or unique origin. Ordinal data show relationships of more than and less than but have no distance or unique origin. Interval scales have both order and distance but no unique origin. Ratio scales possess all four properties’ features. The assumptions underlying each level of scale determine how a particular measurement scale’s data will be analysed statistically.

Number of dimensions

Measurement scales are either uni/one-dimensional or multidimensional. With a uni-dimensional scale, one seeks to measure only one attribute of the participant or object. A multidimensional scale recognizes that an object might be better described with several dimensions than on a uni-dimensional continuum.

Scale construction

We can classify measurement scales by the methods used to build them. There are five of them:

  1. Arbitrary.

  2. Consensus.

  3. Item analysis.

  4. Cumulative.

  5. Factoring.

Rating scales can be used to judge properties of object without reference to other similar objects.

Number of scale points – a scale should be appropriate for its purpose. For a scale to be useful, it should match the stimulus presented and extract information proportionate to the complexity of the attitude, object, concept, or construct. First, as the number of scale points increases, the reliability of the measure increases. Second, in some studies, scales with 11 points may produce more valid results than 3-, 5-, or 7-point scales. Third, some constructs require greater measurement sensitivity and the opportunity to extract more variance, which additional scale points provide.

Fourth, a larger number of scale points are needed to produce accuracy when using single-dimension versus multiple-dimension scales. Finally, in cross-cultural measurement, the cultural practices may condition participants to a standard metric.

What are rating scales used for?

Rating scales are used to judge properties of objects without reference to other similar objects. These ratings may be in such form as ‘like-dislike’ or other classifications using even more categories.

  1. Simple attitude scales – the simple category scale (also called a dichotomous scale) offers two mutually exclusive response choices. These may be ‘yes’ and ‘no’, ‘important’ and ‘unimportant’. This response strategy is particularly useful for demographic questions or where a dichotomous response is adequate. When there are multiple options for the rater but only one answer is sought, the multiple-choice, single-response scale is appropriate. Both the multiple-choice, single-response scale and the simple category scale produce nominal data. A variation, the multiple choice, multiple-response scale (also called a checklist) allows the rater to select one or several alternatives. The cumulative feature of this scale can be beneficial when a complete picture of the participant’s choice is desired. This scale generates nominal data. Simple attitude scales are easy to develop, inexpensive, and can be designed to be highly specific. The design approach is subjective. The researcher’s insight and ability offer the only assurance that the items chosen are a representative sample of the universe of attitudes about the attitude project. There is no evidence that each person will view all items with the same frame of reference as will other people.

  2. Likert scales – the Likert scale is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favourable or an unfavourable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favourableness, and the scores may be summed to measure the participant’s overall attitude. The Likert scale is easy and quick to construct. Careful researchers are careful that each item meets an empirical test for discriminating ability between favourable and unfavourable attitudes. Likert scales are probably more reliable and provide a greater volume of data than many other scales. The scale produces interval data. Originally, creating a Likert scale involved a procedure known as item analysis. In the first step, a large number of statements were collected that met two criteria: (1) each statement was relevant to the attitude being studied; (2) each was believed to reflect a favourable or unfavourable position on that attitude.

    • People similar to those who were going to be studied were asked to read each statement and to state the level of their agreement with it, using a 5-point scale. To ensure consistent results, the assigned numerical values are reversed if the statement is worded negatively. The two extreme groups represent people with the most favourable and least favourable attitudes toward the attitude being studied. These extremes are the two criterion groups by which individual items are evaluated. Item analysis assess each item based on how well it discriminates between those persons whose total score is high and those whose total score is low. The mean scores for the high-score and low-score groups are then tested for statistical significance by computing t values. After finding the t values for each statement, they are rank-ordered, and those statements with the highest t values are selected.

  3. Semantic differential scales – the semantic differential (SD) scale measures the psychological meanings of an attitude object using bipolar adjectives. Researchers use this scale for studies such as brand and institutional image. The method consists of a set of bipolar rating scales, usually with 7 points, by which one or more participants rate one or more concepts on each scale item. The SD scale is based on the proposition that an object can have several dimensions of connotative meaning. The meanings are located in multidimensional property space, called semantic space. Connotative meanings are suggested or implied meanings, in addition to the explicit meaning of an object. The semantic differential has several advantages. It is an efficient and easy way to secure attitudes from a large sample. These attitudes may be measured in both direction and intensity. The total set of responses provides a comprehensive picture of the meaning of an object and a measure of the person doing the rating. It is a standardized technique that is easily repeated but escapes many problems of response distortion found with more direct methods. It produces interval data.

  4. Numerical/multiple rating list scales – numerical scales have equal intervals that separate their numeric scale points. The verbal anchors serve as the labels for the extreme points. Numerical scales are often 5-point scales but may have 7 or 10 points. The participants write a number from the scale next to each item. The scale’s linearity, simplicity, and production of ordinal or interval data make it popular for managers and researchers. A multiple rating list scale is similar to the numerical scale but differs in two ways: (1) it accepts a circled response from the rater, and (2) the layout facilitates visualization of the results. The advantage is that a mental map of the participant’s evaluations is evident to both the rater and the researcher. This scale produces interval data.

  5. Staple scale – is used as an alternative to the semantic differential, especially when it is difficult to find bipolar adjectives that match the investigative question. For example, there are three attributes of corporate image. The scale is composed of the word (or phrase) identifying the image dimension and a set of 10 response categories for each of the three attributes. Fewer response categories are sometimes used. Participants select a plus number for the characteristic that describes the attitude object. The more accurate the description, the larger is the positive number. Similarly, the less accurate the description, the larger is the negative number chosen. Ratings range from +5 to -5, with participants selecting a number that describes the store very accurately to very inaccurately. Like the Likert, SD, and numerical scales, Stapel scales usually produce interval data.

  6. Graphic rating scales – the scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings are possible if participants are sophisticated enough to differentiate and record them.

    • They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length (millimetres) from either endpoint. The results are treated as interval data. The difficulty is in coding and analysis. This scale requires more time than scales with predetermined categories.

Errors to avoid with rating scales – Before accepting participants’ ratings, their tendencies to make errors of central tendency and halo effect should be considered. Some raters are reluctant to give extreme judgments, and this fact accounts for the error of central tendency. Participants may also be ‘easy raters’ or ‘hard raters’, making what is called an error of leniency. These errors most often occur when the rater does not know the object or property being rated. To address these tendencies, researchers can:

  • Adjust the strength of descriptive adjectives.

  • Space the intermediate descriptive phrases farther apart.

  • Provide smaller differences in meaning between the steps near the ends of the scale than between the steps near the centre.

  • Use more points in the scale.

The halo effect: the systematic bias that the rater introduces by carrying over a generalized impression of the subject from one rating to another. The halo effect is especially difficult to avoid when the property being studied is not clearly defined, is not easily observed, is not frequently discussed, involves reactions with others, or is a trait of high moral importance.

Ways of counteracting the halo effect include having the participant rate one trait at a time, revealing one trait per page, or periodically reversing the terms that anchor the endpoints of the scale, so positive attributes are not always on the same end of each scale.

What are ranking scales used for?

In ranking scales, the participant directly compares two or more objects and makes choices among them.

Frequently, the participant is asked to select one as the ‘best’ or the ‘most preferred’. When there are only two choices, this approach is satisfactory, but it often results in ties when more than two choices are found. Using the paired-comparison scale, the participant can express attitudes unambiguously by choosing between two objects. The number of judgments required in a paired comparison is [(n)(n-1)/2], where n is the number of stimuli or objects to be judged. Reducing the number of comparisons per participant without reducing the number of objects can lighten this burden. Each participant can be presented with only a sample of the stimuli. In this way, each pair of objects must be compared an equal number of times. Another procedure is to choose a few objects that are believed to cover the range of attractiveness at equal intervals. All other stimuli are then compared to these few standard objects.

Paired comparisons run the risk that participants will tire to the point that they give ill-considered answers or refuse to continue. A paired comparison provides ordinal data.

The forced ranking scale lists attributes that are ranked relative to each other. This method is faster than paired comparisons and is usually easier and more motivating to the participant. A drawback to forced ranking is the number of stimuli that can be handled by this method. In addition, rank ordering produces ordinal data since the distance between preferences is unknown. Often the manager is interested in benchmarking. This calls for a standard by which other programs, processes, brands, or people can be compared.

The comparative scale is ideal for such comparisons if the participants are familiar with the standard. Some researchers treat the data produced by comparative scales as interval data since the scoring reflects an interval between the standard and what is being compared. The rank or position of the item would be treated as ordinal data unless the linearity of the variables in question could be supported.

The method of successive intervals is often used to sort the items into piles or groups that represent a succession of values.

Arbitrary scales are designed by collecting several items that are unambiguous and appropriate to a given topic. These scales are not only easy to develop, but also inexpensive and can be designed to be highly specific. Moreover, arbitrary scales provide useful information and are adequate if developed skillfully.

Consensus scaling requires items to be selected by a panel of judges and then evaluate them on:

  • Their relevance to the topic area.

  • Their potential for ambiguity.

  • The level of attitude they represent.

In this field, especially Turnstone equal-appearing interval scale is well-known.

Item analysis scaling is the procedure for evaluating an item based on how well it discriminates between those persons whose total score is high and those whose total score is low. The most popular scale using this approach is the Likert scale.

What are cumulative scales used for?

Total scores on cumulative scales have the same meaning. Given the person’s total score, it is possible to estimate which items were answered positively and negatively. A pioneering type of this type was the scalogram.

Scalogram analysis is a procedure for determining whether a set of items forms a uni-dimensional scale. A scale is uni-dimensional if the responses fall into a pattern in which endorsement of the item reflecting the extreme position results in endorsing all items that are less extreme.

The scalogram and similar procedures for discovering underlying structure are useful for assessing attitudes and behaviours that are highly structured, such as social distance, organizational hierarchies, and evolutionary product stages.

Factor scales include a variety of techniques that have been developed to address two problems:

  1. How to deal with a universe of content that is multidimensional?

  2. How to uncover underlying dimensions that have not been identified by exploratory research?

Factoring develops measurement questions through factor analysis or similar correlation techniques. It is particularly useful in uncovering latent attitude dimensions, and it approaches sampling through the concept of multidimensional attribute space. The semantic differential scale is an example.

Other developments in scaling include multidimensional scaling and conjoint analysis. Each represents a family of related techniques with a variety of applications for handling complex judgments. Magnitude estimation and Rasch models provide an avenue for reconceptualising traditional scaling techniques for greater efficiency and freedom form error.

 

To measure: to discover the extent, dimensions, quantity, or capacity of something, especially by comparison with a standard.

 

Measurement in research consists of assigning numbers to empirical events, objects or properties, or activities in compliance with a set of rules.

This definition implies that measurement is a three-part process:

  1. Selecting observable empirical events.

  2. Developing a set of mapping rules: a scheme for assigning numbers or symbols to represent aspects of the event being measured.

  3. Applying the mapping rule(s) to each observation of that event.

Variables being studied in research may be classified as objects or as properties.

Join World Supporter
Join World Supporter
Log in or create your free account

Why create an account?

  • Your WorldSupporter account gives you access to all functionalities of the platform
  • Once you are logged in, you can:
    • Save pages to your favorites
    • Give feedback or share contributions
    • participate in discussions
    • share your own contributions through the 7 WorldSupporter tools
Follow the author: Vintage Supporter
Promotions
verzekering studeren in het buitenland

Ga jij binnenkort studeren in het buitenland?
Regel je zorg- en reisverzekering via JoHo!

Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
[totalcount] 1
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.