Use Sophia to knock out your gen-ed requirements quickly and affordably. Learn more
×

Outliers and Influential Points

Author: Sophia

what's covered
This tutorial is going to teach you about outliers and influential points. Our discussion breaks down as follows:

Table of Contents

1. Outliers

You may recall the term "outliers" when talking about univariate data. However, in bivariate data, outliers are a little bit different.

An outlier is any point that deviates substantially from the overall form of the remainder of the data points.

EXAMPLE

Let's take a look at these two data sets. One thing that you might realize is that the ones on the left seem quite random, whereas in the ones on the right, all the x's except one are 8, which might be a clue to something.
Table 1
Table 2
x y
x y
10 746
8 658
8 677
8 576
13 1274
8 771
9 711
8 884
11 781
8 847
14 884
8 704
6 608
8 525
4 539
19 1,250
12 815
8 556
7 642
8 791
5 573
8 689

However, if you calculate the mean and standard deviation, you will find that they have the same mean for the x's, the same mean for the y's, the same standard deviation for the x's, and the same standard deviation for the y's. Also, their correlations are the same, at 0.816 in a positive direction.

Based on that information, one would think that these two graphs will look fairly similar. Let's take a look:

Graph 1 Graph 2
Graph 1 Graph 2

Both graphs have an outlier that does not follow the overall trend of the graph. Depending on the pattern, the outlier could be an extreme x-value, an extreme y-value, extreme for both the x- and y-values, or neither.

Types of Outliers Example
Extreme x-values Outlier in the x-direction.

This is an outlier in the x-direction because it's so much further to the right of the other pack of points but not in the y-direction. If you look horizontally, it's sort of in the middle lower part of the y-direction. It's an outlier in the x-direction but not the y-direction.
Extreme y-values Outlier in the y-direction.

This is an outlier in the y-direction because it's so much higher than the other y-direction, but not the x-direction.
Extreme x- and y-values Outlier in both the x- and y-direction.

This is an outlier in both the x- and y- direction because it's so much farther to the right and also higher than the rest of the points.
Neither extreme x- or y-values Outlier in neither the x- or y-direction.

Even though it is not extreme in either the x- or y- direction, it doesn't fit the overall trend established by the rest of the data.

term to know
Outlier
Point that deviates substantially from the overall form of the remainder of the data points.


2. Influential Points

Influential points are points that, if removed, significantly change a statistical measure. Usually, the measure that we're talking about changing is correlation, but it could also affect other measurements such as the mean of x or y and the standard deviation of x or y.

Some outliers are influential, and some are not.

EXAMPLE

When the scatterplot on the left includes the outlier, the correlation coefficient is 0.816. However, when we remove the outlier, the correlation coefficient changes to 1. Since this dramatically changes the correlation, this outlier would be considered an influential point.
With Outlier Without Outlier
With outlier:
r = 0.816
Without outlier:
r = 1

EXAMPLE

When the scatterplot below includes the outlier, the mean of x is 9, the standard deviation of x is 3.3, and the correlation is 0.816. However, when we remove the outlier, the mean becomes 8 because now all the x-values are 8, the standard deviation is 0 because they never deviate from 8, and the correlation is 0. Therefore, it changes all of these measures very substantially by being there. That outlier is certainly influential.
With Outlier Without Outlier
With outlier:
mean = 9
standard deviation = 3.3
r = 0.816
Without outlier:
mean = 8
standard deviation = 0
r = 0

EXAMPLE

The outlier in the scatterplot below is not going to have a great effect on the correlation or the least squares regression line that these data sets create. In this case, a line is an inappropriate model, but if you did make a line, having this point versus removing this point wouldn't affect that line or the correlation very much.
Non-Influential Outlier

term to know
Influential Point
An observation that, if removed, significantly changes a statistical measure.

summary
Important points on a scatterplot are influential points and outliers. Influential points substantially change at least one statistical measure. Outliers simply are points that deviate from the overall form of the rest of the points. They may be outliers in the x- or y-direction but don't have to be, according to this definition. Be aware that different people use different definitions of outliers for scatterplots, so there's not one hard-and-fast definition.

Good luck!

Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

Terms to Know
Influential Point

An observation that, if removed, significantly changes a statistical measure.

Outlier

Point that deviates substantially from the overall form of the remainder of the data points.