In this tutorial, you're going to learn about the principles of experimental design. Now what this term experimental design actually means is simply the way in which the experiment is carried out. Now, you can have a good design of an experiment or a poor design of an experiment. And good designs will have certain key components. Let's check them out.
The three big ideas in a well designed experiment are control, randomization, and replication. And we'll go through all of these in depth through an example involving farmers. So let's take a look. Control means holding everything else besides what you're trying to measure constant. And the purpose is to be able to tell whether or not your treatment is effective. You want to determine whether if you observe differences between the groups, are they due to the treatments or are they due to some confounding variable? And so controlling all those other variables helps to limit confounding.
So for example-- the example that we'll use is a farmer wants to try a new type of fertilizer in his field. One thing he could do is choose 10 fields with similar soil nutrients, sunlight, water, all these things that could affect the crop growth. And he could then apply the old fertilizer to five fields and the new fertilizer to the other five.
And by keeping all the other variables-- soil nutrients, sunlight, water, and all these other things consistent, he can isolate the differences between the fields as being due to the old fertilizer or the new fertilizer. And this is what an experiment is going to try and do. Does the fertilizer work? Is it effective? And that's sort of the idea behind controlling for all of these other variables.
The second big idea of experimental design is randomization. And I know this is a lot of writing, but bear with us. The treatments must be assigned to the subject using some random process. And so we've talked about how to randomly select from the sample, but it's the same process for randomly assigning treatments to the individuals in an experiment.
The purpose of random assignment is to try and filter out all the other sources of variation that we couldn't think to control for. So for example, it's possible that even though we had made the fields as similar as possible with respect to water and sunlight and whatever, it's possible that there is a variable that we didn't think to control for.
Maybe some fields have moles and they were under the ground, and that would affect how the crops grow. And so we wouldn't have known that before we started. We wouldn't know to control for that. By randomly assigning treatments to the fields, we can hopefully get some fields with moles with the new fertilizer and some fields with moles to the old fertilizer. And so the idea is it smooths out those effects or it allows us to see through all that noise that these other variables might bring into the equation.
Randomizing also helps us to avoid bias, because we're not going to be tempted to assign treatments to the experimental units we think might give favorable outcomes. So randomly assigning helps us to avoid biases in that regard. Now one important note, randomization in an experiment is not really for the same purpose as random selection in a sample. When we do a simple random sample, the idea is to get a sample that's representative of the population.
In an experiment, the purpose of randomly assigning individuals to groups is to try and filter out the unknown sources of variation. So they serve different purposes, although the randomness is similar. The assignment in an experiment is pretty similar to the way you would randomly select in a sample.
And then finally, replication-- replication is the last key idea in experimental design. The idea is that a bigger sample is better, basically. Repeating the experiment on multiple subjects or experimental units is a better idea than doing few. Why is that? Well, a larger size of the experiment means it's more likely that we can find trends that maybe we wouldn't have found in a smaller experiment.
So let's go back to the example of the farmer. The farmer could have just found two fields that were similar to each other, instead of 10 fields that were similar, and randomly assigned one to get the new fertilizer and one to get the old. But think about it. Isn't it possible in that case that maybe the field with the old fertilizer does very well just by random chance? And that makes it seem like the new fertilizer is not effective when maybe it is. Or the vice versa could happen, where it seems like the fertilizer is effective when it's not.
By randomly assigning five plots, it's more likely that he's going to find trends among those five plots that he can trust more than if he had just done one. So the point is, the more you replicate, the more experimental units you can get into your experiment, the better it's going to be, the more likely it is that you're going to find the true trends that arise, rather than some freak anomaly.
And so to recap, the main elements in a well designed experiment are control, randomization, and replication. Control, again, helps to isolate the effects of the treatments, randomization helps to make the groups as similar as we can and helps to avoid bias, and then replication helps us to see the differences that might not have been evident if we had used a small sample. And so we've talked about treatments, which are the things that you apply to your experimental units, and then control, randomization, and replication. Good luck. We'll see you next time.