This tutorial covers selection bias. Selection bias goes by several other names. It's also called selection effect, or undercoverage bias. With selection bias, it occurs when some subjects are systematically included from the possibility of being part of the sample. The key to this definition is this "systematically."
Now, the systematic exclusion is because there's some particular thing that's causing people to not be included. Now, obviously you're taking a sample, so not every member of the population is going to end up being included. So people will be randomly selected, and random chance should be what determines whether or not people are in the sample or not.
If there's something else that's involved, there's some sort of systematic difference between the people who are being included and not, then you have an issue with bias-- and in particular, an issue with selection bias, also known as undercoverage bias. This type of bias could result in an unrepresentative sample. Here's an example.
If you're looking at telephone numbers and you choose to use a phone book, you'd end up with an unrepresentative sample. The numbers in a phone book right now tend to be older people. Most people have cell phones that are younger, and don't have a landline, and are not listed in any phone book. And of those older people, people who are women living alone tend not to list their phone numbers in a phone book.
So because there's systematic exclusion of young and of people who are afraid to list their phone numbers, then we're not going to have a representative sample, because there are people who are systematically excluded. Now, the way to fix this is through something called random digit dialing. With random digit dialing, the call list includes cell phones, landlines, and unlisted numbers.
The way this happens is a computer randomly selects an area code-- so for example, Minneapolis, 612. And then the computer randomly generates the following seven digits in order to come up with a telephone number. Now, because the digits are randomized and not reliant on a particular list, then it's possible to reach any phone number that exists, so the cell phones and those unlisted numbers are also going to be included. With random digit dialing, we no longer have the issue of systematically excluding the younger or people who are choosing not to list their numbers, and we have a more representative sample, and we don't have an issue of bias.
One organization that uses this technique-- the random digit dialing-- nowadays is Gallup. Gallup conducts public opinion polls. Gallup produces public opinion polls, and their goal is to survey a small number of people in order to accurately represent the adult population of a country. And they use this, often, with phone interviews. Now, with the phone interviews you need your sample to be representative, so they need to use the random digit dialing.
This has been your tutorial on selection bias.
This tutorial looks at two ways of having bias, deliberate bias and unintentional bias. With unintentional bias, it's not done on purpose. The researcher is not intentionally misleading. It comes from an error in the design.
The other forms of bias that other tutorials have looked at, like selection bias, response bias, non-response bias, measurement bias, as long as the researcher is doing these accidentally and not on purpose, they will be unintentional forms of bias. On the other hand, we have deliberate bias. This is the one to be concerned about. It's where the researcher is motivated to purposefully misconstrue results or to purposely design their study in a particular way.
Now. The researcher is trying to advance an interest. This interest could be financial, it could be ideological, or could be personal. One way of doing this is designing your poll questions to push a certain response. We'll look at a couple examples on the next slide.
An example of deliberate bias is the aspartame studies. There is a long series of controversial reports showing that aspartame did hurt people, that aspartame didn't hurt people. And part of the issue was that soda companies, ones that use a lot of the aspartame in their diet brands, were the ones publishing the studies that said that aspartame had no effect.
When people tried to replicate the studies on their own, they were finding that aspartame did have a negative effect. But these people were also motivated. They're also trying to show that the Coke companies were wrong. So when it came to the results, no one knew who to trust, because both sides had different motivations. Both sides are trying to push forward a certain financial or ideological purpose.
On the other hand, cigarettes studies have done something similar as well. Several cigarette companies came out with surveys that show that cigarettes weren't as damaging. One case happened recently when a company partnered with the University of California in order to both evaluate a set of data.
When the cigarette company went through the set of data, they had slightly tweaked the protocol in evaluating it and found pretty low levels of toxicity in cigarettes. When the University of California went through and replicated the study, they maintain the same protocols as originally set up and found pretty high levels of toxicity. So one way of kind of catching or correcting deliberate bias is by having people reproduce your work or by doing peer reviews.
Now, another key point is that authors should note conflicts of interest. Again with cigarettes, one researcher published a result that 80% of lung cancers could be prevented. This was an astonishing result.
However, people later found out that she'd been heavily funded by a cigarette company. They felt like they could no longer trust her conclusions, because she hadn't revealed that up front, and because a cigarette company would like to say that yes, lung cancer can be prevented. So by noting conflicts of interest up front, it helps people to be able to make decisions in real time about whether or not to trust your information.
On the other hand, with unintentional bias, it's not done on purpose. Some examples of that would be using a phone survey during the day. You're not intentionally excluding everyone who works. But because of your design, you end up doing that. In correcting your design, you can correct your bias.
Another case would be not using a placebo. And not using a placebo, you open up the experiment to experimenter effect and to let the participants' knowledge about whether or not they're on a control or a treatment bias their results. So that can be shifted, that can be corrected for by using a placebo.
Deliberate bias can be corrected as well. But you'd have to redo the study with unbiased people involved and with non-motivated researchers who aren't trying to purposely construe the study in a certain way. This is the end of your tutorial on deliberate and unintentional bias.