Table of Contents |
In earlier tutorials, we emphasized the significance of addressing missing data. If missing data is not identified and remedied, it can introduce inaccuracies in the results based on the analysis. Identifying missing data allows us to choose the appropriate method to remedy the missing data. When possible, you want to impute, meaning fill in, missing values in an effort to prevent the deletion of a row or observation from your data set.

Identifying missing data in Excel can be performed in several ways. You will explore two effective techniques in this section. One method involves using an Excel formula, and one method uses filtering to detect missing values.
As you look across the data columns in the Netflix data file, you observe that there are missing values scattered throughout the columns.
EXAMPLE
You can use the Excel COUNTIFS() function to detect if a particular column contains missing values. For instance, using COUNTIFS() you can determine that there are 27 missing values in the director column.The next example will show you how to filter to determine if a column contains missing values.
EXAMPLE
Using a filter, you can determine the number of missing values in the budget column. The filter result, shown below, returns five missing values.
Once you have detected that you have missing values, you then must decide what to do with the missing values. While ultimately how you handle the missing values depends largely on the context of the business scenario, in most circumstances, you will want to handle the missing values in some manner rather than just deleting the observations with the missing values. In the next section, we will make some presumptions about the analysis goals of the Netflix data to illustrate how you can use Excel to remedy the missing values you have detected.
You have just determined that the director column has 27 missing values. Approximately 30% of this column has missing values. Under a circumstance like this, you would not want to ignore the missing values. If you worked at Netflix or wanted to invest the time, you could impute the missing values by finding the directors of the 27 titles that are missing. Let’s presume you do not want to invest this amount of time. For example, suppose you were going to create a report related to the director column. You do not want missing values in the report because you want the report to look professional. So, you are going to impute the missing values with the value “Unknown.” This way, the data will no longer be missing, and the imputed value represents a true characterization of your knowledge of the missing director's values.
EXAMPLE
You can use a combination of the IF() and ISBLANK() functions to impute the missing values of the director column with the value “Unknown.” You can create a new column, director_no_miss that contains the imputed values of the director.
EXAMPLE
You can use Excel to impute missing values for a quantitative variable as well. In a previous example, you determined that five TV shows that lasted 1 Season had missing values for budget.
Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.